One of my favorite MySQL tools ever is pt-query-digest. It's the tool you use to generate a report of your slow query log (or some other supported sources), and is similar to for example the Query Analyzer in MySQL Enterprise Monitor. Especially in the kind of job I am, where I often just land out of nowhere on a server at the customer site, and need to quickly get an overview of what is going on, this tool is always the first one I run. I will show some examples below.
When I started engaging with MongoDB projects at Nokia, I needed some project of my own to tinker with. So I decided to add support for MongoDB "slow query log". Back then, my first task was to figure out if MongoDB even has such logging for slow queries. Lucky me, it has: it's known as theMongoDB Profiler.
Nokia allowed me to contribute changes back upstream to Percona. In the past weeks I finally had some time to upgrade it to understand the changed log format of MongoDB 2.4. (Earlier versions are no longer supported.)
And therefore, I now proudly present to you, pt-query-digest with MongoDB support:https://gist.github.com/henrikingo/6065357/raw/pt-query-digest-mongo(or branch the whole bzr repo:https://code.launchpad.net/~hingo/percona-toolkit/pqd-mongodb-24)
Btw, the other reason I like this and other Percona Toolkit tools: They ship with all their dependencies embedded into the same file. This means, when I'm on a customer server without root access, I can just do
wget https://gist.github.com/henrikingo/6065357/raw/pt-query-digest-mongo
...and I'm good to go! Command line tools for the win :-)
My pt-query-digest supports reading slow queries from the mongodb log file. To get some queries there, you need to set the threshold to lower than 100 ms. In fact, I like to collect as much as possible, so I set it to 1 ms. It's an unfortunate limitation that I cannot set the limit to 0 ms in Mongo.
db.setProfilingLevel(1,1)
Then run pt-query digest with the --type=mongolog option. Using --limit is optional and means the same is in normal pt-query-digest, see --help for other options.
pt-query-digest --type=mongolog --limit=20 < ../mongodb.log
At the top of the output you first get some summary information of all the queries in the log.
# 19.2s user time, 50ms system time, 25.96M rss, 95.67M vsz # Current date: Tue Jul 23 14:21:53 2013 # Hostname: hingo-sputnik # Files: STDIN # Overall: 84.96k total, 9 unique, 0 QPS, 0x concurrency _________________ # Time range: Sat Jul 20 08:08:57 to Jul # Attribute total min max avg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # Exec time 2404s 2ms 83ms 28ms 42ms 13ms 33ms # Rows sent 665.96k 0 100 8.03 97.36 26.46 0 # Rows examine 733.62k 0 200 8.84 97.36 30.16 0 # Query size 3.33M 6 173 41.09 76.28 30.35 31.70 # Key updates 1020 0 1 0.01 0 0.11 0 # Result bytes 97.74M 0 12.91k 1.18k 12.50k 3.47k 92.72 # Rlock time 29s 0 18ms 341us 3ms 972us 0 # Scan & order 6.11k 0 1 0.07 0.99 0.26 0 # Wlock time 392ms 0 7ms 4us 0 63us 0 # Yields 0 0 0 0 0 0 0
From the above we can already see that we are looking at a fairly well behaved system. For example Execution time for all queries is well below 100 ms (so wouldn't even show up by default in the log). Largest result sizes are just below 13k, there was really little time spent in write lock, and so forth.
Next we get a summary of the breakdown per query. Now we get to the parts where pt-query-digest really shines. The typical use of a slow query log would be to just look at whichever queries happen to be slowest, and then try to fix them. Of course, if you have a query that takes tens of seconds to run, there may or may not be something wrong. But experience shows, your energy is often better spent somewhere else completely, and pt-query-digest will tell you where to focus.
Pt-query-digest instead will de-parameterize all of the queries, so that it can recognize which queries are actually the same, just that the parameters were different. It then sums up and sorts all these queries (by default) by execution time. A typical application will only have between 10 to 100 different queries, so this is the perfect tool to give you an overview of what the app is doing to your database:
# Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ================== =============== ===== ====== ===== ============= # 1 0xA9297704DC845CFE 2354.4260 97.9% 71264 0.0330 0.00 COMMAND sbtest.$cmd # 2 0x056DECD9CABFBFFA 18.1960 0.8% 4171 0.0044 0.00 COMMAND sbtest.$cmd # 3 0xF6661101714FC832 17.3660 0.7% 6257 0.0028 0.00 QUERY sbtest.sbtest1 # 4 0x102C8F18966084AF 7.8110 0.3% 1383 0.0056 0.00 UPDATE sbtest.sbtest1 # 5 0x411E607C59AC3D1F 2.1300 0.1% 440 0.0048 0.01 INSERT sbtest.sbtest1 # 6 0xE097817C3B6F6EE3 1.4490 0.1% 565 0.0026 0.00 QUERY sbtest.sbtest1 # 7 0xFC580F8AC66DC809 1.1850 0.0% 442 0.0027 0.00 COMMAND sbtest.$cmd # 8 0x094C2DDF61C1708A 1.1330 0.0% 253 0.0045 0.01 REMOVE sbtest.sbtest1 # 9 0x9696A6AE83C26BF7 0.5080 0.0% 188 0.0027 0.00 QUERY sbtest.sbtest1
Now, if this was a MySQL based application, I would be very worried with the above results. There is one query that is taking up 98% of the total execution time of everything that happens in the database! It turns out, the benchmark I used to generate some "noise" for this example was set to use WriteConcern w=1, so the top line is simply wait time resulting from getLastError calls:
# Query 1: 0 QPS, 0x concurrency, ID 0xA9297704DC845CFE at byte 8790360 __ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:08:57 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 83 71264 # Exec time 97 2354s 2ms 83ms 33ms 44ms 8ms 34ms # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 0 0 0 0 0 0 0 # Query size 65 2.17M 32 32 32 32 0 32 # Key updates 0 0 0 0 0 0 0 0 # Result bytes 6 5.98M 79 169 88.02 92.72 8.26 92.72 # Rlock time 0 0 0 0 0 0 0 0 # Scan & order 0 0 0 0 0 0 0 0 # Wlock time 0 0 0 0 0 0 0 0 # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.$cmd # Query_time distribution # 1us # 10us # 100us # 1ms # # 10ms ################################################################ # 100ms # 1s # 10s+ command sbtest.$cmd { getlasterror: 1, fsync: true }
If your application is doing this, it might not be what you want. As you can see, your clients end up waiting a lot compared to what it actually takes to execute the real queries. But for the MongoDB server this is actually harmless, it's not executing or blocking anything, it's just you that is waiting all the time.
Ok, let's fix that and run pt-query-digest again. We get a different distribution that's more interesting to look at:
# Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ================== ============= ===== ====== ===== =============== # 1 0x056DECD9CABFBFFA 18.1960 36.6% 4171 0.0044 0.00 COMMAND sbtest.$cmd # 2 0xF6661101714FC832 17.3660 34.9% 6257 0.0028 0.00 QUERY sbtest.sbtest1 # 3 0x102C8F18966084AF 7.8110 15.7% 1383 0.0056 0.00 UPDATE sbtest.sbtest1 # 4 0x411E607C59AC3D1F 2.1300 4.3% 440 0.0048 0.01 INSERT sbtest.sbtest1 # 5 0xE097817C3B6F6EE3 1.4490 2.9% 565 0.0026 0.00 QUERY sbtest.sbtest1 # 6 0xFC580F8AC66DC809 1.1850 2.4% 442 0.0027 0.00 COMMAND sbtest.$cmd # 7 0x094C2DDF61C1708A 1.1330 2.3% 253 0.0045 0.01 REMOVE sbtest.sbtest1 # 8 0x9696A6AE83C26BF7 0.5080 1.0% 188 0.0027 0.00 QUERY sbtest.sbtest1
So it seems we have 2 queries that together consume over 70% of the total execution time spent in the database and the top 3 queries use up 95% of your MongoDB capacity! This means, it makes sense to focus solely on optimizing these top 2-3 queries and ignore everything else.
This is what they look like in more detail:
# Query 1: 0 QPS, 0x concurrency, ID 0x056DECD9CABFBFFA at byte 389327 ___ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:11:09 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 30 4171 # Exec time 36 18s 2ms 46ms 4ms 11ms 4ms 3ms # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 0 0 0 0 0 0 0 # Query size 58 695.74k 167 173 170.81 166.51 0 166.51 # Key updates 0 0 0 0 0 0 0 0 # Result bytes 0 309.57k 76 76 76 76 0 76 # Rlock time 21 6s 357us 7ms 1ms 3ms 930us 1ms # Scan & order 0 0 0 0 0 0 0 0 # Wlock time 0 0 0 0 0 0 0 0 # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.$cmd # Query_time distribution # 1us # 10us # 100us # 1ms ################################################################ # 10ms ##### # 100ms # 1s # 10s+ command sbtest.$cmd { aggregate: "sbtest1", pipeline: [ { $match: { _id: { $gte: 8184, $lte: 8283 } } }, { $project: { k: 1, _id: 0 } }, { $group: { _id: null, average: { $sum: "$k" } } } ] } # Query 2: 0 QPS, 0x concurrency, ID 0xF6661101714FC832 at byte 2281363 __ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:09:10 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 45 6257 # Exec time 34 17s 2ms 61ms 3ms 5ms 2ms 2ms # Rows sent 91 610.80k 97 100 99.96 97.36 0.06 97.36 # Rows examine 92 677.46k 97 200 110.87 192.76 29.75 97.36 # Query size 34 408.30k 63 68 66.82 65.89 0.88 65.89 # Key updates 0 0 0 0 0 0 0 0 # Result bytes 85 78.85M 12.52k 12.91k 12.91k 12.50k 0.00 12.50k # Rlock time 66 19s 2ms 18ms 3ms 5ms 1ms 3ms # Scan & order 100 6.11k 1 1 1 1 0 1 # Wlock time 0 0 0 0 0 0 0 0 # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.sbtest1 # Query_time distribution # 1us # 10us # 100us # 1ms ################################################################ # 10ms # # 100ms # 1s # 10s+ query sbtest.sbtest1 { $query: { _id: { $gte: 7957, $lte: 8056 } }, $orderby: { c: 1 } } # Query 3: 0 QPS, 0x concurrency, ID 0x102C8F18966084AF at byte 1229838 __ # Scores: V/M = 0.00 # Time range: Sat Jul 20 08:11:13 to Jul # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 10 1383 # Exec time 15 8s 2ms 44ms 6ms 13ms 4ms 4ms # Rows sent 0 0 0 0 0 0 0 0 # Rows examine 0 1020 0 1 0.74 0.99 0.44 0.99 # Query size 1 17.41k 10 14 12.89 12.54 0.24 12.54 # Key updates 100 1020 0 1 0.74 0.99 0.44 0.99 # Result bytes 0 0 0 0 0 0 0 0 # Rlock time 0 0 0 0 0 0 0 0 # Scan & order 0 0 0 0 0 0 0 0 # Wlock time 57 226ms 19us 7ms 163us 301us 319us 119us # Yields 0 0 0 0 0 0 0 0 # String: # Databases sbtest.sbtest1 # Query_time distribution # 1us # 10us # 100us # 1ms ################################################################ # 10ms ########## # 100ms # 1s # 10s+ update sbtest.sbtest1 { _id: 4482 }
As you can see, there is a lot of information presented for each query. And thanks to the wonder of open source and a few regular expressions, I can now benefit from this reporting tool also in my work with MongoDB.
If you want to read up on what each of the columns mean, consult thePercona Toolkit documentation.
I used Tim Callaghan'sSysbench for MongoDBto generate some queries for this demo.