pymongo模块详解（含官方文档翻译）

孔鸿远

2023-12-01

github

pymongo支持MongoDB 2.6, 3.0, 3.2, 3.4, 3.6 and 4.0.
PyMongo 3.8.0 Documentation
1. 建立与MongoClient的连接
  
  使用PyMongo的第一步是创建一个MongoClient来运行mongod实例。很简单
```
from pymongo import MongoClient
client = MongoClient()
```
  上述代码采用默认host和port，我们也可以制定host和port，例如:
```
client = MongoClient('localhost', 27017)
```
  或者采用MongoDB URL格式：
```
client = MongoClient('mongodb://localhost:27017/')
```
2. 获取一个数据库Database
  
  一个MongoDB实例可以支持多种独立数据库，当使用pymongo你可以用针对MongoClient实例采用属性风格连接到数据库：
```
db = client.test_database
```
  如果你的数据库名字是这样的（例如test-database），导致无法采用属性方式连接，可以采用字典方式连接：
```
db = client['test-database']
```
3. 获取一个连接Collection
  
  Collection是存储在MongoDB中的一组文档，可以粗略的理解为关系型数据库中的table，在pymongo中实现连接方式与获取数据库一样：
```
collection = db.test_collection
# or
collection = db['test-collection']
```
  MongoDB中关于collections和database有个重要特点：他们都是lazily创建的，上述所有命令都不会对服务器产生实际操作。
  
  只有在第一个document被嵌入后，Collections和databases才会被创建。
4. Documents
  
  MongoDB中的数据采用类JSON格式存储或表示。在PyMongo中我们用字典表示documents。例如下面的字典可以用来表示一个博客帖子：
```
import datetime
post = {'author': 'Mike',
       'text': 'My first blog post',
       'tags': ['Mongodb', 'python', 'pymongo'],
       'date': datetime.datetime.utcnow()}
```
  documents可以含有原生Python类型（例如datetime.datetime实例），他们可以自动转化为对应的BSON类型。
5. 插入一个Documents
  
  用insert_one()方法将一个document嵌入到collection中。
```
posts = db.posts
post_id = posts.insert_one(post).inserted_id
```
  当一个document被嵌入，如果document不含有‘_id’,则一个特定的’_id‘就自动添加。’_id‘的值在整个collection中必须是独一无二的。 insert_one() 返回一个InsertOneResult实例。关于’_id'的更多信息，参见 documentation on _id.
  
  嵌入第一个document之后，posts这个collection就存在在服务器上了。可以从过列出database上的所有collections来确认：
```
db.list_collection_names()
```
6. 查询query
  1. 用 find_one()获取一个单独的Document
    
    MongoDB中可执行的最简单的查询（query）就是 find_one(). 这个方法返回一个满足条件的单一document（如果没有匹配的就返回None）。在你知道只有一个符合项或只关心第一个符合项的时候，这个方法是很有用的。如下，我们采用 find_one()从posts这个collection中获取第一个document：
```
import pprint
pprint.pprint(posts.find_one())
```
    结果是符合条件的我们先前插入的字典。
    
    返回的document包含一个’_id’,也就是在嵌入时候自动添加的那个。
    
    find_one()也支持特定元素查询，限定满足author是“Mike”的结果：
```
pprint.pprint(posts.find_one({'author':'Mike'}))
```
    如果我们尝试去找另一个作者’Eliot’，没有结果。
  2. 通过ObjectId查询
    
    我们也可以通过他的’_id’来找一个post，在我们的例子中是ObjectId：
```
post_id
pprint.pprint(posts.find_one({'_id': post_id}))
```
    一个ObjectId跟它的字符串表示是不一样的，比如：
```
post_id_as_str = str(post_id)
posts.find_one({'_id': post_id_as_str}) # No Result
```
    在Web应用中一个常见的任务是从请求URL中获取一个ObjectId，并找到对应的document。在这个案例中，在将其传递为find_one之前把string转变为ObjectId是很有必要的。
```
from bson.objectid import ObjectId
# web framework从URl中获取post_id并将其转为string
def get(post_id):
	# 从string转为ObjectId
	document = client.db.collection.find_one({'_id': ObjectId(post_id)})
```
  3. 查询多个documens
    
    通过一个查询语句获取多个document，采用find()方法。find()返回一个Cursor实例，它可以让我们迭代所有符合的documents。例如，我们可以迭代posts collection中的每个document：
```
for post in posts.find():
	pprint.pprint(post)
```
    跟find_one()一样，我们可以给find()传递一个document来限制返回结果。这里，我们只能得到author是'Mike'的那些documents：
```
for post in posts.find({'author': 'Mike'}):
	pprint.pprint(post)
```
  4. Range Queries范围查询
    
    MongoDB支持多种类型的高级搜索（ advanced queries），例如，我们执行某个查询：限定晚于某一特定时期的的帖子，同时将结果按author排序：
```
d = datetime.datetime(2009, 11, 12, 12)
for post in posts.find({'date': {'$lt':d}}).sort('author'):
	pprint.pprint(post)
```
    这里我们采用一个特殊运算符'$lt'来做范围查询，同时调用sort()来对结果按照author排序。
7. 关于Unicode 字符串
  
  你可能也留意到了，我们存储的正规Python字符串与从服务器上恢复的字符串看起来不太一样（比如用u'Mike'代替了’Mike'）.
  
  MongoDB采用BSON格式存储数据，BSON字符串是UTF-8编码，所以PyMondo必须确保它存储的任何字符串只包含合法的UTF-8数据。正规字符串(<type ‘str’>)被验证并原样存储。首先Unicode字符串(<type ‘unicode’>)被编码成UTF-8.在我们的案例中，Python里的’Mike’被替换为u’Mike’的原因是PyMongo将每一个BSON字符串解码为unicode字符串，而非标准字符串。
  
  You can read more about Python unicode strings here.
8. 批量插入
  
  为了让查询变得更有趣，让我们插入更多documents。除了可以插入单个document，也可以通过传递一个list作为insert_many()的第一个参数，实现bulk insert（批量插入）操作。此操作会将list中的每个document都插入数据库，只需要想服务器发送一个命令：
```
new_posts = [{'author': 'Mike',
			  "text": "Another post!",
              "tags": ["bulk", "insert"],
               "date": datetime.datetime(2009, 11, 12, 11, 14)},
              {"author": "Eliot",
               "title": "MongoDB is fun",
               "text": "and pretty easy too!",
               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
result = posts.insert_many(new_posts) 
result.inserted_ids
```
  关于上例，有两个有趣的地方：
  - insert_many()的结果返回两个ObjectId实例，插入的document各有一个。
  - new_posts[1]与其他posts形状不同：没有’tags‘却有’title’，这就是我们所说的MongDb是模式自由的（Schema-free）
9. 计数Counting
  
  如果我们知识想要知道符合查询条件的结果一共有多少，可以采用count_documents()操作，而不用全部查出来。
```
posts.count_documents({})
```
  或者只计算那些福特特定查询条件的：
```
posts.count_documents({'author':'Mike'})
```
10. Indexing索引
  
  增加索引可以加速特定查询，同时为查询和存储documents增加功能。在这个案例中，我们展示如何创建唯一索引( unique index)，拒绝在索引值已存在于索引中的documents。
  - 第一步，创建索引
```
result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
									unique = True)
sorted(list(db.profiles.index_information()))
```
    现在有两个索引，一个是MongoDb自动船舰的'_id'，另一个是刚才创建的’user_id’。
  - 第二步，设置一些用户配置
```
user_profiles = [{'user_id': 211, 'name': 'Luke'},
			     {'user_id': 212, 'name': 'Ziltoid'}]
result = db.profiles.insert_many(user_profiles)
```
  - 第三步，索引阻止我们插入那些user_id已经在collection中存在的document
```
>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)  # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }
```

pymongo模块详解（含官方文档翻译）

github

PyMongo 3.8.0 Documentation

建立与MongoClient的连接

获取一个数据库Database

获取一个连接Collection

Documents

插入一个Documents

查询query

用 `find_one()`获取一个单独的Document

通过ObjectId查询

查询多个documens

Range Queries范围查询

关于Unicode 字符串

批量插入

计数Counting

Indexing索引

第一步，创建索引

第二步，设置一些用户配置

第三步，索引阻止我们插入那些user_id已经在collection中存在的document

相关阅读

相关文章

相关问答

相关文档

pymongo模块详解（含官方文档翻译）

github

PyMongo 3.8.0 Documentation

建立与MongoClient的连接

获取一个数据库Database

获取一个连接Collection

Documents

插入一个Documents

查询query

用 find_one()获取一个单独的Document

通过ObjectId查询

查询多个documens

Range Queries范围查询

关于Unicode 字符串

批量插入

计数Counting

Indexing索引

第一步，创建索引

第二步，设置一些用户配置

第三步，索引阻止我们插入那些user_id已经在collection中存在的document

相关阅读

相关文章

相关问答

相关文档

用 `find_one()`获取一个单独的Document