100% found this document useful (3 votes)
1K views40 pages

A Crash Course in MongoDB

This document provides an overview of MongoDB and how to use it with Python. MongoDB is a document-oriented, JSON-like database that is scalable, dynamic, and open source. It can be used to store metrics, logs, messages, blog content, and more. The PyMongo driver allows working with MongoDB from Python. Documents can be queried, updated, inserted, and removed using methods like find(), update(), insert(), and remove(). Indexes improve query performance, and geospatial indexes enable location-based queries. GridFS stores and retrieves files in chunks within MongoDB.

Uploaded by

Yamabushi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
1K views40 pages

A Crash Course in MongoDB

This document provides an overview of MongoDB and how to use it with Python. MongoDB is a document-oriented, JSON-like database that is scalable, dynamic, and open source. It can be used to store metrics, logs, messages, blog content, and more. The PyMongo driver allows working with MongoDB from Python. Documents can be queried, updated, inserted, and removed using methods like find(), update(), insert(), and remove(). Indexes improve query performance, and geospatial indexes enable location-based queries. GridFS stores and retrieves files in chunks within MongoDB.

Uploaded by

Yamabushi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

a

Crash Course
in

MongoDB

PyCon US 2013

Andy Dirnberger
github.com/dirn @dirnonline Engineering @ CBS Local

hi. Im

[email protected]

So what is

? MongoDB
http://mongodb.org

MongoDB is...

Document-oriented JSON-like (BSON) Dynamic schema* Scalable Open Source (GNU AGPL v3.0)**
*not the same thing as schemaless **drivers use the Apache license

MongoDB can be used for...

Metrics Logging* Messaging Queues Blog Content Management Anything you want
*Capped collections behave as xed-sized FIFO queues *TTL collections have a special index that will automatically remove old data

To run MongoDB...

Download it: or install it: Run it:


$ mongod $ mongod --dbpath /var/lib/mongodb/ $ mongod --fork http://mongodb.org/downloads $ sudo apt-get install mongodb $ brew install mongodb

http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/

PyMongo MongoDB
using with

Python

https://github.com/mongodb/mongo-python-driver

The driver...

Install it:
$ pip install pymongo

Packages:
pymongo bson gridfs

http://api.mongodb.org/python/current/

BSON supports...

int float basestring list dict datetime.datetime

http://bsonspec.org/

Object IDs are made of...

50d4dce70ea5fae6fb84e44b

4-byte timestamp (50d4dce7) 3-byte machine identier (0ea5fa) 2-byte process ID (e6fb) 3-byte counter (84e44b)

Connect with MongoClient >>> from pymongo import MongoClient >>> >>> MongoClient(host='localhost', port=27017) MongoClient('localhost', 27017) >>> >>> MongoClient(host='mongodb://localhost:27017') MongoClient('localhost', 27017) >>> >>> MongoClient('mongodb://localhost:27017').pycon Database(MongoClient('localhost', 27017), u'pycon')

Querying

Documents can be retrieved with... >>> coll = db.talks >>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}) { u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30) }

Documents can be retrieved with...

>>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1}) <pymongo.cursor.Cursor object at 0x10da4ed90>

http://docs.mongodb.org/manual/reference/operators/#query-selectors

Whats in the cursor?

>>> for doc in cursor: ... print doc ... {u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'} {u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'} {u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'}

http://api.mongodb.org/python/current/api/pymongo/cursor.html

Updating

Documents can be removed with...

>>> coll.remove({'language': 'ruby'}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be removed with...

>>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be removed with...

>>> coll.remove({'language': {'$ne': 'python'}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be inserted with...

>>> db.tracks.insert({ 'number': 2, 'room': 'Grand Ballroom CD'}) ObjectId('5145eb4e0ea5fa321fa97065')

Documents can be inserted with... >>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True) { ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False }

A couple of other methods...

save()
Works like update(..., upsert=True) if _id is specied, insert() if its not

find_and_modify()
Modies the document in the database, returns the original by default, the updated with new=True

A note about update() >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}) {...} >>> >>> # The document has been replaced >>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}) { u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3 }

Using update operators to target specic elds... >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}) { u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1 }

http://docs.mongodb.org/manual/reference/operators/#update

Write concern...

w
The number of servers that must acknowledge the write, including the primary

wtimeout
The timeout for the write, without it the write could block forever

http://docs.mongodb.org/manual/core/write-operations/#write-concern

Write concern...

is turned on by default in MongoClient

Indexes

You can create an index with...

create_index()
Unconditionally creates an index on one or more elds

ensure_index()
Works like create_index() except the driver will remember that the index was already made

Indexes...

Are directional
>>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)]) u'date_1_order_-1'

Can be sparse
Only documents containing all elds in the index will be included in the index

Explain plans... { 'cursor' : '<Cursor Type and Index>', 'n' : <num (documents matching query)>, 'nscanned': <num (documents scanned)>, 'scanAndOrder': <boolean>, } You want n and nscanned to be as close together as possible If scanAndOrder is True, the index cant be used for sorting
http://docs.mongodb.org/manual/reference/explain/

GridFS

Storing les with GridFS...

Files are stored in chunks 4MB of RAM Replication and Sharing

http://docs.mongodb.org/manual/applications/gridfs/

To use GridFS... >>> import gridfs >>> fs = gridfs.GridFS(db) >>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA') >>> file = fs.get(file_id) >>> file.read() 'PyCon 2013' >>> file.upload_date datetime.datetime(2013, 3, 17, 21, 30, 0, 0) >>> file.city, file.state (u'Santa Clara', u'CA')

GridFS is versioned...

get_last_version()
Gets the most recent le matching the query

get_version()
Works like get_last_version() except it can request specic versions of a le

Geospatial

Create an index...

>>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}) {...} >>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)]) u'loc_2d'

http://docs.mongodb.org/manual/applications/geospatial-indexes/

Query, query, query...

>>> db.tracks.find({'loc': [37.3542, 121.9542]}) <pymongo.cursor.Cursor object at 0x10e14eb90> >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}}) <pymongo.cursor.Cursor object at 0x10e14edd0>

You can query $within shapes...

{'$center': [center, radius]} {'$box': [[x1, y1], [x2, y2]]} {'$polygon': [[x1, y1], [x2, y2],

[x3, y3]]}

Anything else...

Aggregation Framework
Helps with simple map reduce queries, but is subject to the same 16MB as documents

Libraries
http://api.mongodb.org/python/current/tools.html

Thank you!
dirn.it/PyCon2013

Questions?

You might also like