$bucket
The $bucket
operator in MongoDB is used within an aggregation pipeline to categorize incoming documents into buckets or groups based on a specified expression and boundaries. This operator is particularly useful for dividing a collection of documents into ranges and performing aggregate calculations on each range.
Syntax
Here's the basic syntax of the $bucket
operator:
{
$bucket: {
groupBy: <expression>,
boundaries: [<lowerbound1>, <lowerbound2>, ...],
default: <default_value>,
output: {
<output_field1>: { <accumulator1>: <expression1> },
...
}
}
}
groupBy
: The expression by which to group documents.boundaries
: An array of values that specify the boundaries for each bucket.default
: The value to use for documents that don't fall into any bucket.output
: Optional. The fields to include in the output documents, along with their corresponding accumulator expressions.
Example
Consider a sales
collection with the following documents:
[
{ "_id": 1, "amount": 100 },
{ "_id": 2, "amount": 200 },
{ "_id": 3, "amount": 300 },
{ "_id": 4, "amount": 400 },
{ "_id": 5, "amount": 500 }
]
You can use the $bucket
operator to categorize these sales into different ranges:
db.sales.aggregate([
{
$bucket: {
groupBy: "$amount",
boundaries: [0, 200, 400, 600],
default: "Other",
output: {
count: { $sum: 1 },
average_amount: { $avg: "$amount" }
}
}
}
])
This will produce:
[
{ "_id": 0, "count": 2, "average_amount": 150 },
{ "_id": 200, "count": 2, "average_amount": 350 },
{ "_id": 400, "count": 1, "average_amount": 500 }
]
Considerations
-
The
boundaries
array must be sorted in ascending order, and it cannot contain duplicate values. -
The
groupBy
expression can include field paths, literals, and other expressions. -
The
default
field is mandatory for handling documents that don't fit into any of the specified buckets. -
The
output
field allows you to apply various accumulator expressions like$sum
,$avg
,$min
,$max
, etc., to the documents in each bucket.