Contents
- BedquiltDB Conceptual Overview
- create a collection called 'users'
- get a Collection object, referencing the new 'users' collection
- How many users do you think we have?
- => "ba40513444b760b7eb2684d8"
- => "some_meaningful_identifier"
- => "john@example.com"
- =>
- => list
- => {_id: '...', 'email': '...', ...}
- => None
- => {_id: '400241', ...}
- sort by age ascending, then by name descending
BedquiltDB Conceptual Overview
This document attempts to describe BedquiltDB at a high level.
BedquiltDB Architecture
BedquiltDB is divided into two components:
- The
bedquilt
extension for PostgreSQL - A set of client "driver" libraries
Once the bedquilt
extension is installed on a PostgreSQL server, a driver library can be connected to the server and used to read and write JSON data. The driver proivides
an API that feels native to the language it is written in and manages
conversion from language-native data structures to JSON and back again.
The following diagram illustrates how a web application written in python might use
the pybedquilt
driver to interface with a database which has bedquilt
installed:
As we can see from the diagram, the drivers find
method is really a thin wrapper
around an SQL statement which uses a SQL function called bq_find
, which is provided
by the bedquilt
extension. All of the functionality of BedquiltDB is
implemented in this way, with all the "smart stuff" implemented inside the database,
behind custom SQL functions.
This approach provides several advantages over simply writing wrapper logic around SQL in a specific language:
- The logic of BedquiltDB is performed inside the database, close to the data
- Driver libraries become very simple to implement and test
For more information on setting up BedquiltDB, see Installation.
Drivers
To use BedquiltDB, the programmer will need to import a BedquiltDB driver for their favourite programming language and use it to connect to the PostgreSQL/BedquiltDB server. Example, with the pybedquilt driver:
python
import pybdequilt
db = pybedquilt.BedquiltClient('dbname=test')
The db
object holds a connection to the server, and provides an api for the collections in the BedquiltDB database.
Collections
In a BedquiltDB database, JSON data is stored in collections. You write data into collections, then read it back out later, like so:
```python
create a collection called 'users'
db.create_collection('users')
get a Collection object, referencing the new 'users' collection
users = db['users']
How many users do you think we have?
print users.count() ```
Documents
Collections contain documents. A document is essentially a single JSON object.
The BedquiltDB driver handles converting from native data-structures to JSON and back again.
A document can have practically any structure you could want, as long as it's valid JSON,
with one exception: all documents must have an _id
field, with a string value.
If a document without an _id
field is written to a collection, then a random string will be
generated and set as the _id
value. The _id
field is used as the unique primary-key in
the collection. If two documents are saved with the same _id
, then the second one will over-write the first.
Here we see an example of saving a python dictionary to a BedquiltDB collection as a JSON object:
python
users.insert({
"_id": "john@example.com",
"name": "John",
"age": 45,
"address": {
"street": "Elm Row",
"city": "Edinburgh"
}
})
We can read that same document out later:
python
john = users.find_one_by_id("john@example.com")
Or retrieve it as part of a more general query:
edinburgh_users = users.find({"address": {"city": "Edinburgh"}})
Writing Data
There are two operations which write JSON data to a collection: insert
and save
.
The insert
operation takes a JSON document and inserts it into the collection, generating
an _id
value if needed. Regardless, the insert
operation always returns the _id
of the
inserted document:
```python
print pets.insert({"name": "Snuffles", "species": "dog"})
=> "ba40513444b760b7eb2684d8"
print pets.insert({"id": "somemeaningful_identifier", "name": "Larry", "species": "cat"})
=> "some_meaningful_identifier"
```
The save
operation also takes a JSON document, but it first
checks if the document has an _id
field. If it does, and a document with that same _id
exists in the collection, then the old document will be overwritten by the new one.
Otherwise, save
behaves the same as insert
: if there are no documents in the collection
with the same _id
then the document is simply inserted into the collection, and the
_id
returned to the caller:
```python
john = users.find_one_by_id('john@example.com')
john['age'] = 46
result = users.save(john)
print result
=> "john@example.com"
```
Reading Data
There are three operations which read json documents out of collections: find
, find_one
andfind_one_by_id
. The find
operation takes a "query document" and compares it to
the documents in the collection, returning the set of documents which match the query.
A document is considered a match if the query matches some subset of the document.
For example, we could find all active users:
python
cool_people = db['users'].find({
'active': true
})
or, we could find all active users who are living in Edinburgh:
python
cool_people = db['users'].find({
'active': true,
'address': {
'city': 'Edinburgh'
}
})
or all active users in Edinburgh who have both "icecream"
and "code"
in their list of likes
:
python
cool_people = db['users'].find({
'active': true,
'address': {
'city': 'Edinburgh'
},
'likes': ['icecream', 'code']
})
We can also just query for all documents in the collection, by suppling an empty query document:
python
cool_people = db['users'].find({})
For some BedquiltDB drivers, the result of a find
operation will be a Cursor
of results,
rather than an Array. Generally the driver should use the languages equivalent of lazy sequences to prepresent query result sets. This is so that the results can be streamed from the PostgreSQL server to the client as needed, rather than being eagerly materialised in memory:
```python print db['users'].find({...})
=>
```
We can iterate over the cursor, pulling in results as they are needed:
python
result = db['users'].find({...})
for doc in result:
print doc
Or we can just turn the result into a list:
```python result = list( db['users'].find({}) ) print type(result)
=> list
```
The find_one
operation also takes a query document, just like find
,
but it only returns at most a single result,
or null
if there were no matching documents:
```python print db['users'].find_one({'email': 'user@example.com'})
=> {_id: '...', 'email': '...', ...}
print db['users'].find_one({'this': 'matches': {'nothing'}})
=> None
```
As if that weren't enough, the find_one_by_id
operation takes a string id
instead
of a query document, and returns the document with the matching _id
field, or null
if there are no documents with that _id
.
```python print db['users'].find_one_by_id('400241')
=> {_id: '400241', ...}
```
If we have a list of document ids, we can use find_many_by_ids
to get them all
in one query, rather than using find_one_by_id
multiple times:
python
print db['orders'].find_many_by_ids(['X2242', 'X5373', 'X1762'])
We can also get a list of the distinct values we may have under a given key, with
the distinct
operation.
python
print db['users'].distinct('address.city')
Advanced Queries
All of the examples so far have queried for documents which match a query document. In other words, the query document should be a sub-set of the matched document. BedquiltDB also supports more advanced query operations. For example, we can test that a certain field does not equal to a given value:
python
print db['orders'].find({
'address': {
'city': {
'$noteq': 'Glasgow'
}
}
})
Or that a numeric field is greater than a certain value:
python
print db['articles'].find_one({
'upvotes': {
'$gt': 4
}
})
Of course, we can intermingle these special queries with ordinary matching queries too:
python
print db['orders'].find({
'processed': True',
'address': {
'city': {
'$noteq': 'Glasgow'
},
'address1': 'Church Street'
}
})
See the BedquiltDB Spec for full documentation on advanced query operations.
Skip, Limit and Sort
The find
operation takes a few extra, optional parameters which allow you to control
the number of documents that are returned from the query.
The limit
option limits the result set to the desired size:
python
db['users'].find({active: true}, limit=10)
The skip
option omits a number of documents from the start of the result set:
python
db['users'].find({active: true}, limit=10, skip=4)
The sort
option allows you to specify how the result set should be sorted:
```python
sort by age ascending, then by name descending
db['users'].find({active: true}, limit=10, skip=4, sort=[{'age': 1}, {'name': -1}] ) ```
Naturally, the skip
, limit
and sort
options to find
can be used in any
combination. If no sort order is specified, the result set is likely to be sorted
naturally in the order the documents were written to the collection. However, this
behaviour is not guaranteed, so if you care about ordering you should sort by a
document field which has meaning to your data.
Removing Data
Removing data from a collection can be accomplished with the remove
, remove_one
and remove_one_by_id
operations. remove
and remove_one
take a query document and remove any documents which match the query, while remove_one_by_id
takes a string id
and removes the document in the collection with the same _id
.
All of the remove*
operations return an integer indicating the number of documents
that were removed.
Beware: the remove*
operations will permanantly delete data. There is no way to recover data removed in this way.
Updating Data
At the moment, the only way to update a document in a collection is to use the save
operation detailed above.