MONGOC_AGGREGATE(3) | libmongoc | MONGOC_AGGREGATE(3) |
This document provides a number of practical examples that display the capabilities of the aggregation framework.
The Aggregations using the Zip Codes Data Set examples uses a publicly available data set of all zipcodes and populations in the United States. These data are available at: zips.json.
Let's check if everything is installed.
Use the following command to load zips.json data set into mongod instance:
$ mongoimport --drop -d test -c zipcodes zips.json
Let's use the MongoDB shell to verify that everything was imported successfully.
$ mongo test connecting to: test > db.zipcodes.count() 29467 > db.zipcodes.findOne() { "_id" : "35004", "city" : "ACMAR", "loc" : [ -86.51557, 33.584132 ], "pop" : 6055, "state" : "AL" }
Each document in this collection has the following form:
{ "_id" : "35004", "city" : "Acmar", "state" : "AL", "pop" : 6055, "loc" : [-86.51557, 33.584132] }
In these documents:
To get all states with a population greater than 10 million, use the following aggregation pipeline:
aggregation1.c
#include <mongoc/mongoc.h> #include <stdio.h> static void print_pipeline (mongoc_collection_t *collection) { mongoc_cursor_t *cursor; bson_error_t error; const bson_t *doc; bson_t *pipeline; char *str; pipeline = BCON_NEW ("pipeline", "[", "{", "$group", "{", "_id", "$state", "total_pop", "{", "$sum", "$pop", "}", "}", "}", "{", "$match", "{", "total_pop", "{", "$gte", BCON_INT32 (10000000), "}", "}", "}", "]"); cursor = mongoc_collection_aggregate ( collection, MONGOC_QUERY_NONE, pipeline, NULL, NULL); while (mongoc_cursor_next (cursor, &doc)) { str = bson_as_canonical_extended_json (doc, NULL); printf ("%s\n", str); bson_free (str); } if (mongoc_cursor_error (cursor, &error)) { fprintf (stderr, "Cursor Failure: %s\n", error.message); } mongoc_cursor_destroy (cursor); bson_destroy (pipeline); } int main (void) { mongoc_client_t *client; mongoc_collection_t *collection; const char *uri_string = "mongodb://localhost:27017/?appname=aggregation-example"; mongoc_uri_t *uri; bson_error_t error; mongoc_init (); uri = mongoc_uri_new_with_error (uri_string, &error); if (!uri) { fprintf (stderr, "failed to parse URI: %s\n" "error message: %s\n", uri_string, error.message); return EXIT_FAILURE; } client = mongoc_client_new_from_uri (uri); if (!client) { return EXIT_FAILURE; } mongoc_client_set_error_api (client, 2); collection = mongoc_client_get_collection (client, "test", "zipcodes"); print_pipeline (collection); mongoc_uri_destroy (uri); mongoc_collection_destroy (collection); mongoc_client_destroy (client); mongoc_cleanup (); return EXIT_SUCCESS; }
You should see a result like the following:
{ "_id" : "PA", "total_pop" : 11881643 } { "_id" : "OH", "total_pop" : 10847115 } { "_id" : "NY", "total_pop" : 17990455 } { "_id" : "FL", "total_pop" : 12937284 } { "_id" : "TX", "total_pop" : 16986510 } { "_id" : "IL", "total_pop" : 11430472 } { "_id" : "CA", "total_pop" : 29760021 }
The above aggregation pipeline is build from two pipeline operators: $group and $match.
The $group pipeline operator requires _id field where we specify grouping; remaining fields specify how to generate composite value and must use one of the group aggregation functions: $addToSet, $first, $last, $max, $min, $avg, $push, $sum. The $match pipeline operator syntax is the same as the read operation query syntax.
The $group process reads all documents and for each state it creates a separate document, for example:
{ "_id" : "WA", "total_pop" : 4866692 }
The total_pop field uses the $sum aggregation function to sum the values of all pop fields in the source documents.
Documents created by $group are piped to the $match pipeline operator. It returns the documents with the value of total_pop field greater than or equal to 10 million.
To get the first three states with the greatest average population per city, use the following aggregation:
pipeline = BCON_NEW ("pipeline", "[", "{", "$group", "{", "_id", "{", "state", "$state", "city", "$city", "}", "pop", "{", "$sum", "$pop", "}", "}", "}", "{", "$group", "{", "_id", "$_id.state", "avg_city_pop", "{", "$avg", "$pop", "}", "}", "}", "{", "$sort", "{", "avg_city_pop", BCON_INT32 (-1), "}", "}", "{", "$limit", BCON_INT32 (3) "}", "]");
This aggregate pipeline produces:
{ "_id" : "DC", "avg_city_pop" : 303450.0 } { "_id" : "FL", "avg_city_pop" : 27942.29805615551 } { "_id" : "CA", "avg_city_pop" : 27735.341099720412 }
The above aggregation pipeline is build from three pipeline operators: $group, $sort and $limit.
The first $group operator creates the following documents:
{ "_id" : { "state" : "WY", "city" : "Smoot" }, "pop" : 414 }
Note, that the $group operator can't use nested documents except the _id field.
The second $group uses these documents to create the following documents:
{ "_id" : "FL", "avg_city_pop" : 27942.29805615551 }
These documents are sorted by the avg_city_pop field in descending order. Finally, the $limit pipeline operator returns the first 3 documents from the sorted set.
MongoDB, Inc
2017-present, MongoDB, Inc
March 31, 2024 | 1.26.0 |