QUERY, SET, ROWS 165 4 Alex 5 Bob 6 Carl With the argument distinct=True, you can specify that you only want to select distinct records. This has the same effect as grouping using all specified fields except that it does not require sorting. When using distinct it is important not to select ALL fields, and in particular not to select the "id" field, else all records will always be distinct. Here is an example: 1 >>> for row in db().select(db.person.name, distinct=True): 2 print row.name 3 Alex 4 Bob 5 Carl With limitby, you can select a subset of the records (in this case, the first two starting at zero): 1 >>> for row in db().select(db.person.ALL, limitby=(0, 2)): 2 print row.name 3 Alex 4 Bob Currently, "limitby" is only partially supported on MSSQL since the Mi- crosoft database does not provide a mechanism to fetch a subset of records not starting at 0. Logical Operators Queries can be combined using the binary AND operator "&": 1 >>> rows = db((db.person.name=='Alex') & (db.person.id>3)).select() 2 >>> for row in rows: print row.id, row.name 3 4 Alex and the binary OR operator "|": 1 >>> rows = db((db.person.name=='Alex') | (db.person.id>3)).select() 2 >>> for row in rows: print row.id, row.name 3 1 Alex You can negate a query (or sub-query) with the != binary operator: 1 >>> rows = db((db.person.name!='Alex') | (db.person.id>3)).select() 2 >>> for row in rows: print row.id, row.name 3 2 Bob 4 3 Carl or by explicit negation with the ˜ unary operator: 166 THE DATABASE ABSTRACTION LAYER 1 >>> rows = db(˜(db.person.name=='Alex') | (db.person.id>3)).select() 2 >>> for row in rows: print row.id, row.name 3 2 Bob 4 3 Carl Due to Python restrictions in overloading "AND" and "OR" operators, these cannot be used in forming queries. The binary operators must be used instead. count, delete, update You can count records in a set: 1 >>> print db(db.person.id > 0).count() 2 3 You can delete records in a set: 1 >>> db(db.person.id > 3).delete() And you can update all records in a set by passing named arguments corresponding to the fields that need to be updated: 1 >>> db(db.person.id > 3).update(name='Ken') Expressions The value assigned an update statement can be an expression. For example consider this model 1 >>> db.define_table('person', 2 Field('name'), 3 Field('visits', 'integer', default=0)) 4 >>> db(db.person.name == 'Massimo').update( 5 visits = db.person.visits + 1) The values used in queries can also be expressions 1 >>> db.define_table('person', 2 Field('name'), 3 Field('visits', 'integer', default=0), 4 Field('clicks', 'integer', default=0)) 5 >>> db(db.person.visits == db.person.clicks + 1).delete() update record web2py also allows updating a single record that is already in memory using update record ONE TO MANY RELATION 167 1 >>> rows = db(db.person.id > 2).select() 2 >>> row = rows[0] 3 >>> row.update_record(name='Curt') 6.6 One to Many Relation To illustrate how to implement one to many relations with the web2py DAL, define another table "dog" that refers to the table "person" which we redefine here: 1 >>> db.define_table('person', 2 Field('name')) 3 >>> db.define_table('dog', 4 Field('name'), 5 Field('owner', db.person)) Table "dog" has two fields, the name of the dog and the owner of the dog. When a field type is another table, it is intended that the field reference the other table by its id. In fact, you can print the actual type value and get: 1 >>> print db.dog.owner.type 2 reference person Now, insert three dogs, two owned by Alex and one by Bob: 1 >>> db.dog.insert(name='Skipper', owner=1) 2 1 3 >>> db.dog.insert(name='Snoopy', owner=1) 4 2 5 >>> db.dog.insert(name='Puppy', owner=2) 6 3 You can select as you did for any other table: 1 >>> for row in db(db.dog.owner==1).select(): 2 print row.name 3 Skipper 4 Snoopy Because a dog has a reference to a person, a person can have many dogs, so a record of table person now acquires a new attribute dog, which is a Set, that defines the dogs of that person. This allows looping over all persons and fetching their dogs easily: 1 >>> for person in db().select(db.person.ALL): 2 print person.name 3 for dog in person.dog.select(): 4 print ' ', dog.name 5 Alex 6 Skipper 168 THE DATABASE ABSTRACTION LAYER 7 Snoopy 8 Bob 9 Puppy 10 Carl Inner Joins Another way to achieve a similar result is by using a join, specifically an INNER JOIN. web2py performs joins automatically and transparently when the query links two or more tables as in the following example: 1 >>> rows = db(db.person.id==db.dog.owner).select() 2 >>> for row in rows: 3 print row.person.name, 'has', row.dog.name 4 Alex has Skipper 5 Alex has Snoopy 6 Bob has Puppy Observe that web2py did a join, so the rows now contain two records, one from each table, linked together. Because the two records may have fields with conflicting names, you need to specify the table when extracting a field value from a row. This means that while before you could do: 1 row.name and it was obvious whether this was the name of a person or a dog, in the result of a join you have to be more explicit and say: 1 row.person.name or: 1 row.dog.name Left Outer Join Notice that Carl did not appear in the list above because he has no dogs. If you intend to select on persons (whether they have dogs or not) and their dogs (if they have any), then you need to perform a LEFT OUTER JOIN. This is done using the argument "left" of the select command. Here is an example: 1 >>> rows=db().select(db.person.ALL, db.dog.ALL, left=db.dog.on(db. person.id==db.dog.owner)) 2 >>> for row in rows: 3 print row.person.name, 'has', row.dog.name 4 Alex has Skipper 5 Alex has Snoopy 6 Bob has Puppy 7 Carl has None HOW TO SEE SQL 169 where: 1 left = db.dog.on( ) does the left join query. Here the argument of db.dog.on is the condition required for the join (the same used above for the inner join). In the case of a left join, it is necessary to be explicit about which fields to select. Grouping and Counting When doing joins, sometimes you want to group rows according to certain criteria and count them. For example, count the number of dogs owned by every person. web2py allows this as well. First, you need a count operator. Second, you want to join the person table with the dog table by owner. Third, you want to select all rows (person + dog), group them by person, and count them while grouping: 1 >>> count = db.person.id.count() 2 >>> for row in db(db.person.id==db.dog.owner).select(db.person.name, count, groupby=db.person.id): 3 print row.person.name, row._extra[count] 4 Alex 2 5 Bob 1 Notice the count operator (which is built-in) is used as a field. The only issue here is in how to retrieve the information. Each row clearly contains a person and the count, but the count is not a field of a person nor is it a table. So where does it go? It goes into a dictionary called extra. This dictionary exists for every row returned by a select when you fetch special objects from the database that are not table fields. 6.7 How to see SQL Sometimes you need to generate the SQL but not execute it. This is easy to do with web2py since every command that performs database IO has an equivalent command that does not, and simply returns the SQL that would have been executed. These commands have the same names and syntax as the functional ones, but they start with an underscore: Here is insert 1 >>> print db.person._insert(name='Alex') 2 INSERT INTO person(name) VALUES ('Alex'); Here is count 170 THE DATABASE ABSTRACTION LAYER 1 >>> print db(db.person.name=='Alex')._count() 2 SELECT count( * ) FROM person WHERE person.name='Alex'; Here is select 1 >>> print db(db.person.name=='Alex')._select() 2 SELECT person.id, person.name FROM person WHERE person.name='Alex'; Here is delete 1 >>> print db(db.person.name=='Alex')._delete() 2 DELETE FROM person WHERE person.name='Alex'; And finally, here is update 1 >>> print db(db.person.name=='Alex')._update() 2 UPDATE person SET WHERE person.name='Alex'; 6.8 Exporting and Importing Data CSV (one table at a time) When a DALRows object is converted to a string it is automatically serialized in CSV: 1 >>> rows = db(db.person.id==db.dog.owner).select() 2 >>> print rows 3 person.id,person.name,dog.id,dog.name,dog.owner 4 1,Alex,1,Skipper,1 5 1,Alex,2,Snoopy,1 6 2,Bob,3,Puppy,2 You can serialize a single table in CSV and store it in a file "test.csv": 1 >>> open('test.csv', 'w').write(str(db(db.person.id).select())) and you can easily read it back with: 1 >>> db.person.import_from_csv_file(open('test.csv', 'r')) When importing, web2py looks for the field names in the CSV header. In this example, it finds two columns: "person.id" and "person.name". It ignores the "person." prefix, and it ignores the "id" fields. Then all records areappended and assignednewids. Bothoftheseoperationscanbeperformed via the appadmin web interface. CSV (all tables at once) In web2py, you can backup/restore an entire database with two commands: To export: EXPORTING AND IMPORTING DATA 171 1 >>> db.export_to_csv_file(open('somefile.csv', 'wb')) To import: 1 >>> db.import_from_csv_file(open('somefile.csv', 'rb')) This mechanism can be used even if the importing database is of a different type than the exporting database. The data is stored in "somefile.csv" as a CSV file where each table starts with one line that indicates the tablename, and another line with the fieldnames: 1 TABLE tablename 2 field1, field2, field3, Two tables are separated by 5 \r\n\r\n". The file ends with the line 1 END The file does not include uploaded files if these are not stored in the database. In any case it is easy enough to zip the "uploads" folder separately. When importing, the new records will be appended to the database if it is not empty. In general the new imported records will not have the same record id as the original (saved) records but web2py will restore references so they are not broken, even if the id values may change. If a table contains a field called "uuid", this field will be used to identify duplicates. Also, if an imported record has the same "uuid" as an existing record, the previous record will be updated. CSV and remote Database Synchronization Consider the following model: 1 db = DAL('sqlite:memory:') 2 db.define_table('person', 3 Field('name')) 4 db.define_table('dog', 5 Field('owner', db.person), 6 Field('name')) 7 db.dog.owner.requires = IS_IN_DB(db, 'person.id', '%(name)s') 8 9 if not db(db.person.id>0).count(): 10 id = db.person.insert(name="Massimo") 11 db.dog.insert(owner=id, name="Snoopy") Each record is identified by an ID and referenced by that ID. If you have two copies of the database used by distinct web2py installations, the ID is unique only within each database and not across the databases. This is a problem when merging records from different databases. 5 " 172 THE DATABASE ABSTRACTION LAYER In order to makea record uniquely identifiable across databases, they must: • have a unique id (UUID), • have a timestamp (to figure out which one is more recent if multiple copies), • reference the UUID instead of the id. This can be achieved without modifying web2py. Here is what to do: • Change the above model into: 1 db.define_table('person', 2 Field('uuid', length=64, default=uuid.uuid4()), 3 Field('modified_on', 'datetime', default=now), 4 Field('name')) 5 db.define_table('dog', 6 Field('uuid', length=64, default=uuid.uuid4()), 7 Field('modified_on', 'datetime', default=now), 8 Field('owner', length=64), 9 Field('name')) 10 11 db.dog.owner.requires = IS_IN_DB(db,'person.uuid','%(name)s') 12 13 if not db(db.person.id).count(): 14 id = uuid.uuid4() 15 db.person.insert(name="Massimo", uuid=id) 16 db.dog.insert(owner=id, name="Snoopy") • Create a controller action to export the database: 1 def export(): 2 s = StringIO.StringIO() 3 db.export_to_csv_file(s) 4 response.headers['Content-Type'] = 'text/csv' 5 return s.getvalue() • Create a controller action to import a saved copy of the other database and sync records: 1 def import_and_sync(): 2 form = FORM(INPUT(_type='file', _name='data'), 3 INPUT(_type='submit')) 4 if form.accepts(request.vars): 5 db.import_from_csv_file(form.vars.data.file,unique=False ) 6 # for every table 7 for table in db.tables: 8 # for every uuid, delete all but the latest 9 items = db(db[table].id>0).select(db[table].id, 10 db[table].uuid, 11 orderby=˜db[table].modified_on, 12 groupby=db[table].uuid) MANY TO MANY 173 13 for item in items: 14 db((db[table].uuid==item.uuid)&\ 15 (db[table].id!=item.id)).delete() 16 return dict(form=form) • Create an index manually to make the search by uuid faster. Notice that steps 2 and 3 work for every database model; they are not specific for this example. Alternatively, you can use XML-RPC to export/import the file. If the records reference uploaded files, you also need to export/import the content of the uploads folder. Notice that files therein are already labeled by UUIDs so you do not need to worry about naming conflicts and references. HTML/XML (one table at a time) DALRows objects also have an xml method (like helpers) that serializes it to XML/HTML: 1 >>> rows = db(db.person.id > 0).select() 2 >>> print rows.xml() 3 <table><thead><tr><th>person.id</th><th>person.name</th><th>dog.id</ th><th>dog.name</th><th>dog.owner</th></tr></thead><tbody><tr class="even"><td>1</td><td>Alex</td><td>1</td><td>Skipper</td><td >1</td></tr><tr class="odd"><td>1</td><td>Alex</td><td>2</td><td> Snoopy</td><td>1</td></tr><tr class="even"><td>2</td><td>Bob</td ><td>3</td><td>Puppy</td><td>2</td></tr></tbody></table> If you need to serialize the DALRows in any other XML format with custom tags, you can easily do that using the universal TAG helper and the * notation: 1 >>> rows = db(db.person.id > 0).select() 2 >>> print TAG.result( * [TAG.row( * [TAG.field(r[f], _name=f) for f in db .person.fields]) for r in rows]) 3 <result><row><field name="id">1</field><field name="name">Alex</field ></row><row><field name="id">2</field><field name="name">Bob</ field></row><row><field name="id">3</field><field name="name"> Carl</field></row></result> 6.9 Many to Many In the previous examples, we allowed a dog to have one owner but one person could have many dogs. What if Skipper was owned by Alex and Curt? This requires a many-to-many relation, and it is realized via an intermediate table that links a person to a dog via an ownership relation. 174 THE DATABASE ABSTRACTION LAYER Here is how to do it: 1 >>> db.define_table('person', 2 Field('name')) 3 >>> db.define_table('dog', 4 Field('name')) 5 >>> db.define_table('ownership', 6 Field('person', db.person), 7 Field('dog', db.dog)) the existing ownership relationship can now be rewritten as: 1 >>> db.ownership.insert(person=1, dog=1) # Alex owns Skipper 2 >>> db.ownership.insert(person=1, dog=2) # Alex owns Snoopy 3 >>> db.ownership.insert(person=2, dog=3) # Bob owns Puppy Now you can add the new relation that Curt co-owns Skipper: 1 >>> db.ownership.insert(person=3, dog=1) # Curt owns Skipper too Because you now have a three-way relation between tables, it may be convenient to define a new set on which to perform operations: 1 >>> persons_and_dogs = db((db.person.id==db.ownership.person) & (db. dog.id==db.ownership.dog)) Now it is easy to select all persons and their dogs from the new Set: 1 >>> for row in persons_and_dogs.select(): 2 print row.person.name, row.dog.name 3 Alex Skipper 4 Alex Snoopy 5 Bob Puppy 6 Curt Skipper Similarly, you can search for all dogs owned by Alex: 1 >>> for row in persons_and_dogs(db.person.name=='Alex').select(): 2 print row.dog.name 3 Skipper 4 Snoopy and all owners of Skipper: 1 >>> for row in persons_and_dogs(db.dog.name=='Skipper').select(): 2 print row.owner.name 3 Alex 4 Curt A lighter alternative to Many 2 Many relations is a tagging. Tagging is discussed in the context of the IS IN DB validator. Tagging works even on database backends that does not support JOINs like the Google App Engine. . >>> db.export _to_ csv_file(open('somefile.csv', 'wb')) To import: 1 >>> db.import_from_csv_file(open('somefile.csv', 'rb')) This mechanism. 1,Alex,1,Skipper,1 5 1,Alex,2,Snoopy,1 6 2,Bob,3,Puppy,2 You can serialize a single table in CSV and store it in a file "test.csv": 1 >>> open('test.csv', 'w').write(str(db(db.person.id).select())) and. Bothoftheseoperationscanbeperformed via the appadmin web interface. CSV (all tables at once) In web2 py, you can backup/restore an entire database with two commands: To export: EXPORTING AND IMPORTING