bankera ico

MongoDB aggregation framework query: how to transform documents by applying rules from another collection

On StackOverflow, a user posted this question asking for support to create, with the MongoDB Aggregation Framework, the query that returned his expected result.

The scenario consists of two collections: one that contains documents related to some “statistics” and another that contains a set of “rules” that must be applied to the first collection in order to properly modify its documents.

Documents of the statistics collection, which we call “mystats”, merely include an “id” and a “value”, and therefore their respect the following structure:

mystats = [
  {
    _id: 1,
    value: 3.0
  },
  {
    _id: 2,
    value: 0.5
  },
  {
    _id: 3,
    value: 10.0
  }
]

Documents of the rules collection, which we call “myrules”, have instead such a structure:

myrules = [
  {
    coll: 'mystats',  
    source: '_id',  
    eq: 1,          
    ratio: 2.0
  },
  {
    coll: 'mystats',
    source: '_id',
    eq: 2,
    disable: true
  },
    {
    coll: 'anothercoll',
    source: '_id',
    eq: 3,
	ratio: 10.0 
  }
]

where:

  • coll: indicates the collection to which the rule applies; In this case we are interested in those with “mystats” value
  • source: indicates the statistics collection field to use for the “join”; In this case we don’t need it
  • eq: this is the rules collection field to use for “join” with the statistics documents, matching its value with the “_id” field
  • ratio: it is the value to multiply by the “value” field of mystats collection documents that match
  • disable: if present is valued at true and indicates that the statistics document with “id” equal to that “eq” should not be displayed

The request was also to show as output only the “id” and “value” fields.

So, given the sample inputs, the result to show must be as follows:

{ 
    "_id" : 1.0, 
    "value" : 6.0
}

as:

  • The mystats document with id = 1 matches the rule with eq = 1, which supposes multiplying the “value” (3.0) for a 2.0 “ratio”, giving 6.0
  • The mystats document with id = 2 matches the rule with eq = 2, which has the “disable” field set to true and for this reason it must not be shown
  • The mystats document with id = 3 matches the rule with eq = 3, which however has a value of the “coll” property other than “mystats” and so it is discarded from the selection

Let’s see how to build step by step with the MongoDB aggregation framework the query that returns our expected result.
The steps we need to perform and so to define as stages of the aggregation pipeline are:

  • Make the “join” between the two collections using the $lookup stage
  • Split the given result into single documents using an $unwind stage
  • Filter with a $match stage only those that haven’t the “disable” property and that have the value of the “coll” property equals to “mystats”
  • Multiply the “value” by the “ratio” available into the relevant “myrules” document by using the $multiply operator
  • Limit fields to be shown to only “id” and “value” with the appropriate $project stage

Let’s start with the $lookup stage in order to relate the two collections with the equivalent of a join between tables in a classic SQL approach:

db.mystats.aggregate([
   { $lookup:
     {
       from: "myrules",
       localField: "_id",
       foreignField: "eq",
       as: "docs"
     }
   }
])

This way we can get a collection of “myrules” document, for each document in the “mystats” collection, that meet the “_ id” = “eq” condition. Let’s call the property that contains this new array “docs”. In this case, actually, we will have only one object in the “docs” array for each document.

The result of the first execution of this partial query is as follows:

{ 
    "_id" : 1.0, 
    "value" : 3.0, 
    "docs" : [
        {
            "_id" : ObjectId("595b87d483051080fed7bcd3"), 
            "coll" : "mystats", 
            "source" : "_id", 
            "eq" : 1.0, 
            "ratio" : 2.0
        }
    ]
}
{ 
    "_id" : 2.0, 
    "value" : 0.5, 
    "docs" : [
        {
            "_id" : ObjectId("595b87d483051080fed7bcd4"), 
            "coll" : "mystats", 
            "source" : "_id", 
            "eq" : 2.0, 
            "disable" : true
        }
    ]
}
{ 
    "_id" : 3.0, 
    "value" : 10.0, 
    "docs" : [
        {
            "_id" : ObjectId("595b87d483051080fed7bcd5"), 
            "coll" : "anothercoll", 
            "source" : "_id", 
            "eq" : 3.0, 
            "ratio" : 5.0
        }
    ]
}

All this is illustrated in the following screenshot:
MongoDB aggregation query Lookup stage
As a second step, we need to get a different “mystats” document for each of the “rules” documents in the “docs” array. This is achieved through an $unwind stage of the MongoDB aggregation pipeline. To add this stage, we modify our query as follows:

db.mystats.aggregate([
   {$lookup:
     {
       from: "myrules",
       localField: "_id",
       foreignField: "eq",
       as: "docs"
     }
   },
   {$unwind:"$docs"}
]
)

As we said, in fact, the “docs” array contained only one object for each “mystats” document, so the result does not change much in this case: we will always have 3 output documents where, however, for each one the “docs” property will no longer be an array but a single object of the collection” myrules”.

{ 
    "_id" : 1.0, 
    "value" : 3.0, 
    "docs" : {
        "_id" : ObjectId("595b87d483051080fed7bcd3"), 
        "coll" : "mystats", 
        "source" : "_id", 
        "eq" : 1.0, 
        "ratio" : 2.0
    }
}
{ 
    "_id" : 2.0, 
    "value" : 0.5, 
    "docs" : {
        "_id" : ObjectId("595b87d483051080fed7bcd4"), 
        "coll" : "mystats", 
        "source" : "_id", 
        "eq" : 2.0, 
        "disable" : true
    }
}
{ 
    "_id" : 3.0, 
    "value" : 10.0, 
    "docs" : {
        "_id" : ObjectId("595b87d483051080fed7bcd5"), 
        "coll" : "anothercoll", 
        "source" : "_id", 
        "eq" : 3.0, 
        "ratio" : 5.0
    }
}

MongoDB aggregation query Unwind stage
Now we must apply the filters to our query and to do so we use the $match stage of the pipeline. The selection conditions we have to apply are two:

  • The value of the “coll” property of the document “docs” embedded in the “mystats” collection document must be equal to “mystats”
  • The embedded document must not have the “disable” property (which, as for the specification, when it is present is always set to true)

Let’s start with the first condition, very simple, by changing the query to:

db.mystats.aggregate([
   {$lookup:
     {
       from: "myrules",
       localField: "_id",
       foreignField: "eq",
       as: "docs"
     }
   },
   {$unwind:"$docs"},
   {$match: 
     {
       "docs.coll": "mystats"
     }
   }
])

The generated output is the following, where we can see that the number of returned documents decreased from 3 to 2 , because it has been removed the document “id” = 3 that matches a document of the “myrules” collection that has the “coll” property valued with “anothercoll” instead of “mystats”.

{ 
    "_id" : 1.0, 
    "value" : 3.0, 
    "docs" : {
        "_id" : ObjectId("595b87d483051080fed7bcd3"), 
        "coll" : "mystats", 
        "source" : "_id", 
        "eq" : 1.0, 
        "ratio" : 2.0
    }
}
{ 
    "_id" : 2.0, 
    "value" : 0.5, 
    "docs" : {
        "_id" : ObjectId("595b87d483051080fed7bcd4"), 
        "coll" : "mystats", 
        "source" : "_id", 
        "eq" : 2.0, 
        "disable" : true
    }
}

Now add the second filter condition to the $match stage. To do this we need to introduce two operators: $and and $exists. The first, obviously, to define a logical AND condition between the two filter clauses and the second to determine whether a given property exists in an object or not.

Let’s refine our query, which becomes:

db.mystats.aggregate([
   {$lookup:
     {
       from: "myrules",
       localField: "_id",
       foreignField: "eq",
       as: "docs"
     }
   },
   {$unwind:"$docs"},
   {$match: 
     {  $and: 
       	[
       		{"docs.disable":{ "$exists": false}},
       		{"docs.coll": "mystats"}
       	]
     }
   }
])

This time in output we get only one document, the following:

{ 
    "_id" : 1.0, 
    "value" : 3.0, 
    "docs" : {
        "_id" : ObjectId("595b87d483051080fed7bcd3"), 
        "coll" : "mystats", 
        "source" : "_id", 
        "eq" : 1.0, 
        "ratio" : 2.0
    }
}

This is because the document with “id” = 2 has the “disable” field and is then deleted.
MongoDB aggregation query Match stage
The next step is to multiply the value of the “value” property for the appropriate “ratio” in the embedded document “docs”. This can be done through the $multiply operator directly in the aggregation pipeline $project stage, that is, where we select the fields to keep in the resulting documents.

We modify the query, obtaining what will be our MongoDB aggregation framework final pipeline, like this:

db.mystats.aggregate([
   {$lookup:
     {
       from: "myrules",
       localField: "_id",
       foreignField: "eq",
       as: "docs"
     }
   },
   {$unwind:"$docs"},
   {$match: 
     {  $and: 
       	[
       		{"docs.disable":{ "$exists": false}},
       		{"docs.coll": "mystats"}
       	]
     }
   },
   {$project: 
     {
        "_id":"$_id", 
        "value": { $multiply: [ "$value", "$docs.ratio" ] }
     }
    }
]
)

By running it we get, as can be seen from the following screenshot, the expected outcome as defined at the beginning of the article:
MongoDB aggregation query Project stage
On StackOverflow, the answer, in a smaller and more synthetic version, was accepted by the author of the question and commented with “Great! Thank you!”.

Leave a Reply

Your email address will not be published. Required fields are marked *