Databases - The Next Generation - Drupalcon DC

05 Mar 2009
Posted by jcfiala

Databases: The Next Generation
larry garfield

Works for palantir.net

drupal 6 database layer
MySQL-centric
ext/mysql, ext/mysqli, ext/pgsql
PHP 3 based database handling.

Limitations:
Swapping include files
- can't use more than one db driver at once

Requires manual blob encoding because of postgressql

can't do custom field handling - can't support oracle or db2

No db-specific optimizations - lowest common dominator.
which means most folks do general mysql, postgres in the cold.
Requires manual type escaping, etc, which uses preg_callback, regex-based (slow)
No database replication support - sites will handle by hacking core.
no transaction support - node_save fails haflway, you get half of a node
String-concatination used to construct dynamic queries

all of it adds up to major DB fail
In 1997, 'Go PHP 5' started, we need to drop PHP 4.
Now can use PDO now that php5 is requried for drupal 7 Totally object oriented api
totally unified structure for database access.
Still different dialects for different databases.
Formal prepared statements - instead of regex or concatination, use placeholders in query to place values. no chance of sql injection because the db knows the difference between statement and data.
Drupal 7 databases - totally pdo/object oriented.
Procedural wrappers around api
New driver model - writing new driver is easier simpler
get all sorts of goodies.

* easeri drier eimplementation
- multiple database types at once
- improved db portability
* database-specific optimizations
* Transactions
* master/slave replication
* Query builder in core!

method calls to build query and run it.

new concepts:
Database targets
Placeholders
Fluent API
- chaining methods like in jquery
- all modifier queries work this way
- never write insert again

targets
* alternate database connection
* can be used or not
* fallback to default
* master/slave replication
* multi-slave configuration
* other targets? system?

Configuration:
$databases['default']['default'] = array(
'driver' => 'mysql',
'database' => 'drupaldb',
'username' => 'username',
'password' => 'secret',
'host' => 'localhost',
);

It doesn't have to be super-easy, we write it during install/config and most people never touch it anyway. And this is faster to pick up and use. Can vary the array per database driver.
Slave servers - set up [default][slave][] with additional slave dbs.
master / slave loadbalancing - slow queries that are not time specfic can go to a slave db.

discussion of failover by folks in the audience.
most targets are defined by magic words - 'slave', 'default' - won't let you write directly to a slave server.

New drivers are easy - subclass and go
Magic directory location includes/database/name-of-driver
No hard-coded list
- can maintain additional drivers in contrib - don't need to hack core to add new database drivers.

Types of queries:
* Static
* Dynamic
* Insert
* Update
* Delete
* Merge

Static: Doesn't change based on environment - majority of queries
Takes named placholders - self documenting, specifify values in any order, Always an array of values. (something like t())

function db_query($query, array $args = array(), array $options = array()) {
if (empty($options['target'])) {
$options['target'] = 'default';
}
$db = Database::getConnection($options['target']);
return $db->query($query, $args, $options);
}

$result = db-query('select nid, title from {node} where uid = :uid', array(':uid' => $uid,));

$result object is pdo result object -
foreach ($result as $record) {
// do something with each $record
}

PHP5 - foreach() is nicer than while().
Default is to return stdClass object as result - since that's what core is mostly doing.
$result->fetch() - default type
$result->fetchObject() - fetch a stdClass
$result->fetchAssoc() - fetching as associative array

$record = $result->fetchField($column); - get a specific field out of the array - replaces db_result().

Can also now fetch the result set at once - $result->fetchAll(), ->fetchAllAssoc($field) - keyed by the field, so array keyed on 'nid'

// retrieve a 2-column result set as an assoiative array of field 1 => field2
$result->fetchAllKeyed();

// Retrieve a 1-column result set as one single array
$result->fetchCol();

Generally, this is all really cool and I am very excited.

$options - lets you set details
$options = array(
'target' => 'default',
'fetch' => PDO::FETCH_OBJ,
);

FETCH_ASSOC, FETCH_BOTH, FETCH_OBJ, FETCH_NUM (numeric array) or even specfiy a custom class that you want it to populate... tricky, but documented. all of this is CORE PDO - we get it for free.

Can't automatically send all select statements to slave servers - it's difficult to automatically figure out which select queries are safe for slaves, so it's left up to developer to set, becase it could be up to one minute stale...

Dynamic queries
* SELECT queries using query builder
* db_select()
* Only if dynamic! overhead
* extenders for pager and tablesort

hook_query_alter()
* all dynamic select statements
* query tagging
* structured data is easy to alter
access internal uery structure
* tag the node access query to grab it's attention.

Insert, Update, Delete
db_insert / db_update, db_delete
Builder is *required*
- cross-database blog and lob
Fully chainable
*"fluent api"

insert - multi-insert, delayed insert, optimized per-driver

merge - db_merge()
"INSERT, UPDATE, whatever,
* update to same value or new

Transactions!
* Force multiple queries to run atomically as one.
* Only suported on some databases - silly myisam which doesn't, and fails to current functionality
* Use exceptions

Extensive unit tests
api unit tests work as api documentation - most extensive tests in core

question - is the colon required by placeholders? Answer, yes.
System will take an array for a value, and will then use IN - WHERE nid IN (:nids)"... array(':nid' => array(584, 283, 287))
Insert statements - db_insert returns an object, use it's ->fields() method to set the fields for a new row, and call ->execute() to do it.
See testSimpleInsert() api unit test for v. good example.
can do db_insert('test')->fields(blah)->execute();

after fields can call values to add more rows to be inserted. If you don't specify the fields in values, then you can use the order in fields. So, see testInsertFieldOnlyDefinition for an example
db_insert('test')->fields(array('name', 'age'))->values(array('larry', '30'))->values(array('moe', 31'));

return value of the execute is the last id that you inserted.

Update does ->fields to update and ->condition() for the where clause.

see testSimpleUpdate, testMultipleUpdate in unit tests

->condition('age', 26, '>') - age > 26

mysql is in strict mode in drupal 7 - have to do things correctly instead of allowing errors.

->where('age > :age', array(':age' => 26))
Multiple condition/where is anded together.

quotes are handled by the database, not by you - 'name = :name' doesn't care about quotes.

or clauses:
->condition(db_or(), condition('name', 'John'), condition('name', 'Paul')) -- this is ored inside, but then anded with other conditions. db_and() and db_xor() also supported.

Everything covered in handbooks in excessive detail

Larry Garfield is the database maintainer.

Task list
http://drupal.org/community-initiatives/drupal-core/databases

code sprint - help to convert all of core to use the new database so can get rid of backwards compatibility layer.

Tags: