MongoDB’de Aggregation Pipeline Operation Kullanımı ve Sorgu Optimizasyonu (MongoDB Öğreniyoruz 7)

11 min readAug 8, 2022

Merhaba arkadaşlar

Aggregation stage’lerde kullanılmak üzere oluşturulmuş aggregate fonksiyonlarıdır. Şu sayfada tüm listesi görebilirsiniz. Burada çok az bir kısmını yazı serisi içinde bu kısım eksik kalmasın en azından nasıl kullanılacağı hakkında fikrimiz olsun diye işleyeceğiz. Zaten birçoğunu buraya kadar işlediğimiz konularda uygulamıştık.

UYARI : Şunu da bilmemiz gerekiyor her operation her stage’de kullanılamıyor. Bu nedenle ilgili operatörün sayfasında kontrol etmeyi unutmayın.

Makale serisinin diğer yazıları için alttaki linkleri kullanabilirsiniz.

Aggregation Pipeline Operation’lar aşağıdaki başlıklar altında kategorize edilmişlerdir. Şu sayfadan detayların erişebilirsiniz.

Arithmetic Expression Operators
Array Expression Operators
Boolean Expression Operators
Comparison Expression Operators
Conditional Expression Operators
Custom Aggregation Expression Operators
Date Expression Operators
Object Expression Operators
Set Expression Operators
String Expression Operators
Trigonometry Expression Operators
Type Expression Operators
Accumulators

Operatörlere geçmeden önce Aggregation Expression’larda kullanılan sistem değişkenlerine bakalım. Aggregation pipeline yazarken bazı sistem ve üzerinden çalıştığımız collection’lara ait bazı verilere ihtiyacımız olacak.

Değişkenlerin değerine ulaşmak içi $$ öneki (prefix) kullanılır.

NOW: Şuanki tarih ve saati döndürür
CLUSTER_TIME: Replica_set ve sharded cluster ortamlarında cluster tarih ve saatini döndürür.
ROOT: Top level dokümanı döndürür. Örneğin Aggreate pipeline yazdığımızda hangi collection üzeriden pipeline’ı yazıyorsak onu döndürür.
CURRENT: Değiştirlmediği sürece ROOT ile aynı dokümanı döndürür.
REMOVE: İlgili alan üzerinde missing value kontrolü yapar eğer alan missin value içeriyorsa dokümandan siler.
DESCEND: $redact stage’de kullanılır. Pipeline Stage makalesine bakınız.
PRUNE: $redact stage’de kullanılır. Pipeline Stage makalesine bakınız.
KEEP: $redact stage’de kullanılır. Pipeline Stage makalesine bakınız.

Operatörlerden bazıları inceleyelim.

$add: Sayısal ve tarihsel eklemeler yapar

//örneğin sales tiye bir collection'da alttaki gibi dokümanların olduğunu varsayalım{ "_id" : 1, "item" : "abc", "price" : 10, "fee" : 2, date: ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "fee" : 1, date: ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5,  "fee" : 0, date: ISODate("2014-03-15T09:00:00Z") }// aggreagation pipeline çalıştırıyoruzdb.sales.aggregate(
   [
     { $project: { item: 1, total: { $add: [ "$price", "$fee" ] } } }
   ]
)// sonuç{ "_id" : 1, "item" : "abc", "total" : 12 }
{ "_id" : 2, "item" : "jkl", "total" : 21 }
{ "_id" : 3, "item" : "xyz", "total" : 5 }

Tarih eklemede milisaniye kullanılır.

db.sales.aggregate(
   [
     { $project: { item: 1, billing_date: { $add: [ "$date", 3*24*60*60000 ] } } }
   ]
)// sonuç
{ "_id" : 1, "item" : "abc", "billing_date" : ISODate("2014-03-04T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "billing_date" : ISODate("2014-03-04T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "billing_date" : ISODate("2014-03-18T09:00:00Z") }

$avg: Sayısal verilerin ortalamasını hesaplar. Sum, count, ceil, vb sayısal işlem yapan operatörlerin de kullanımı aynıdır.

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") }
{ "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") }
{ "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:12:00Z") }// fiyat ile adet çarpılarak ortalaması hesaplanıyordb.sales.aggregate(
   [
     {
       $group:
         {
           _id: "$item",
           avgAmount: { $avg: { $multiply: [ "$price", "$quantity" ] } },
           avgQuantity: { $avg: "$quantity" }
         }
     }
   ]
)// sonuç{ "_id" : "xyz", "avgAmount" : 37.5, "avgQuantity" : 7.5 }
{ "_id" : "jkl", "avgAmount" : 20, "avgQuantity" : 1 }
{ "_id" : "abc", "avgAmount" : 60, "avgQuantity" : 6 }

$dayOfYear — $dayOfMonth — $dayOfWeek:

//örnek bir doküman{
  "_id" : 1,
  "item" : "abc",
  "price" : 10,
  "quantity" : 2,
  "date" : ISODate("2014-01-01T08:15:39.736Z")
}//db.sales.aggregate(
   [
     {
       $project:
         {
           year: { $year: "$date" },
           month: { $month: "$date" },
           day: { $dayOfMonth: "$date" },
           hour: { $hour: "$date" },
           minutes: { $minute: "$date" },
           seconds: { $second: "$date" },
           milliseconds: { $millisecond: "$date" },
           dayOfYear: { $dayOfYear: "$date" },
           dayOfWeek: { $dayOfWeek: "$date" },
           week: { $week: "$date" }
         }
     }
   ]
)// sonuç{
  "_id" : 1,
  "year" : 2014,
  "month" : 1,
  "day" : 1,
  "hour" : 8,
  "minutes" : 15,
  "seconds" : 39,
  "milliseconds" : 736,
  "dayOfYear" : 1,
  "dayOfWeek" : 4,
  "week" : 0
}

$divide: Bölme işlemi yapar.

// Örnek doküman
{ "_id" : 1, "name" : "A", "hours" : 80, "resources" : 7 },
{ "_id" : 2, "name" : "B", "hours" : 40, "resources" : 4 }db.planning.aggregate(
   [
     { $project: { name: 1, workdays: { $divide: [ "$hours", 8 ] } } }
   ]
)//sonuç{ "_id" : 1, "name" : "A", "workdays" : 10 }
{ "_id" : 2, "name" : "B", "workdays" : 5 }

$documentNumber: Aggregate pipeline’nın çalışması sonucunda her bir dokümanın array üzerindeki pozisyonunu atar.

// örnek doküman
db.cakeSales.insertMany( [
   { _id: 0, type: "chocolate", orderDate: new Date("2020-05-18T14:10:30Z"),
     state: "CA", price: 13, quantity: 120 },
   { _id: 1, type: "chocolate", orderDate: new Date("2021-03-20T11:30:05Z"),
     state: "WA", price: 14, quantity: 140 },
   { _id: 2, type: "vanilla", orderDate: new Date("2021-01-11T06:31:15Z"),
     state: "CA", price: 12, quantity: 145 },
   { _id: 3, type: "vanilla", orderDate: new Date("2020-02-08T13:13:23Z"),
     state: "WA", price: 13, quantity: 104 },
   { _id: 4, type: "strawberry", orderDate: new Date("2019-05-18T16:09:01Z"),
     state: "CA", price: 41, quantity: 162 },
   { _id: 5, type: "strawberry", orderDate: new Date("2019-01-08T06:12:03Z"),
     state: "WA", price: 43, quantity: 134 }
] )// pipelinedb.cakeSales.aggregate( [
   {
      $setWindowFields: {
         partitionBy: "$state",
         sortBy: { quantity: -1 },
         output: {
            documentNumberForState: {
               $documentNumber: {}
            }
         }
      }
   }
] )// sonuç: documentNumberForState alanında herbir dokümana uniqe değer atanmış.{ "_id" : 4, "type" : "strawberry", "orderDate" : ISODate("2019-05-18T16:09:01Z"),
  "state" : "CA", "price" : 41, "quantity" : 162, "documentNumberForState" : 1 }
{ "_id" : 2, "type" : "vanilla", "orderDate" : ISODate("2021-01-11T06:31:15Z"),
  "state" : "CA", "price" : 12, "quantity" : 145, "documentNumberForState" : 2 }
{ "_id" : 0, "type" : "chocolate", "orderDate" : ISODate("2020-05-18T14:10:30Z"),
  "state" : "CA", "price" : 13, "quantity" : 120, "documentNumberForState" : 3 }
{ "_id" : 1, "type" : "chocolate", "orderDate" : ISODate("2021-03-20T11:30:05Z"),
  "state" : "WA", "price" : 14, "quantity" : 140, "documentNumberForState" : 1 }
{ "_id" : 5, "type" : "strawberry", "orderDate" : ISODate("2019-01-08T06:12:03Z"),
  "state" : "WA", "price" : 43, "quantity" : 134, "documentNumberForState" : 2 }
{ "_id" : 3, "type" : "vanilla", "orderDate" : ISODate("2020-02-08T13:13:23Z"),
  "state" : "WA", "price" : 13, "quantity" : 104, "documentNumberForState" : 3 }

$function: javascript kullanarak custom fonksiyon yazmamızı sağlar.

Syntax’i şu şekildedir.

{
  $function: {
    body: <code>,
    args: <array expression>,
    lang: "js"
  }
}

Örnek üzerinde görelim.

db.players.insertMany([
   { _id: 1, name: "Miss Cheevous",  scores: [ 10, 5, 10 ] },
   { _id: 2, name: "Miss Ann Thrope", scores: [ 10, 10, 10 ] },
   { _id: 3, name: "Mrs. Eppie Delta ", scores: [ 9, 8, 8 ] }
])// aggregate pipeline
db.players.aggregate( [
   { $addFields:
      {
        isFound:
            { $function:
               {
                  body: function(name) {
                     return hex_md5(name) == "15b0a220baa16331e8d80e15367677ad"
                  },
                  args: [ "$name" ],
                  lang: "js"
               }
            },
         message:
            { $function:
               {
                  body: function(name, scores) {
                     let total = Array.sum(scores);
                     return `Hello ${name}.  Your total score is ${total}.`
                  },
                  args: [ "$name", "$scores"],
                  lang: "js"
               }
            }
       }
    }
] )
// sonuç: isFound ve message adında iki alan eklenmiş oldu.{ "_id" : 1, "name" : "Miss Cheevous", "scores" : [ 10, 5, 10 ], "isFound" : false, "message" : "Hello Miss Cheevous.  Your total score is 25." }{ "_id" : 2, "name" : "Miss Ann Thrope", "scores" : [ 10, 10, 10 ], "isFound" : true, "message" : "Hello Miss Ann Thrope.  Your total score is 30." }{ "_id" : 3, "name" : "Mrs. Eppie Delta ", "scores" : [ 9, 8, 8 ], "isFound" : false, "message" : "Hello Mrs. Eppie Delta .  Your total score is 25." }

$accumulator: Custom accumulator operator yazmak için kullanılır.
$substrBytes — $strLenBytes: String bir ifadenin istenilen kadarlık kısmını döndürür. strLenBytes ise string ifadenin UTF-8 olarak kaç byte olduğunu döndürür.

// Örnek veri{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }
// sorgudb.inventory.aggregate(
  [
    {
      $project: {
        item: 1,
        yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] }, // quarter alanın ilk iki karakterini alıyor
        quarterSubtring: {
          $substrBytes: [
            "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] } // subtract toplam byte'ın ilk iki byte'ını geri dönüdüyor.
          ]
        }
      }
    }
  ]
)//sonuç{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
{ "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
{ "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

$convert: Belirtilen tipe dönüşüm yapar. Hangi tipler arasında dönüşüm yapılabildiğini ve sonuçlarının ne olacağı ile ilgili şu sayfayı ziyaret ediniz.

Syntax şu şekildedir.

{
   $convert:
      {
         input: <expression>,
         to: <type expression>,
         onError: <expression>,  // Optional.
         onNull: <expression>    // Optional.
      }
}

Örnek uygulama

db.orders.insertMany( [
   { _id: 1, item: "apple", qty: 5, price: 10 },
   { _id: 2, item: "pie", qty: 10, price: Decimal128("20.0") },
   { _id: 3, item: "ice cream", qty: 2, price: "4.99" },
   { _id: 4, item: "almonds" },
   { _id: 5, item: "bananas", qty: 5000000000, price: Decimal128("1.25") }
] )
// stage'ler ayrıca yazılıp pipile içinden çağrılmış// birinci stage
priceQtyConversionStage = {
   $addFields: {
      convertedPrice: { $convert:
         {
            input: "$price",
            to: "decimal",
            onError: "Error",
            onNull: Decimal128("0")
         } },
      convertedQty: { $convert:
         {
            input: "$qty",
            to: "int",
            onError:{ $concat:
               [
                  "Could not convert ",
                  { $toString:"$qty" },
                  " to type integer."
               ]
            },
         onNull: Int32("0")
      } },
   }
};
// ikinci stage
totalPriceCalculationStage = {
   $project: { totalPrice: {
     $switch: {
        branches: [
          { case:
             { $eq: [ { $type: "$convertedPrice" }, "string" ] },
             then: "NaN"
          },
          { case:
             { $eq: [ { $type: "$convertedQty" }, "string" ] },
             then: "NaN"
          },
        ],
        default: { $multiply: [ "$convertedPrice", "$convertedQty" ] }
     }
} } };
// aggregete pipeline
db.orders.aggregate( [
   priceQtyConversionStage,
   totalPriceCalculationStage
])
// Sonuç{ _id: 1, totalPrice: Decimal128("50") },
{ _id: 2, totalPrice: Decimal128("200.0") },
{ _id: 3, totalPrice: Decimal128("9.98") },
{ _id: 4, totalPrice: Decimal128("0") },
{ _id: 5, totalPrice: 'NaN' }

Regular Expression ile İlgili Aggregation Operator’leri

MongoDB Per Compatible Regular Expression (PCRE) kullanır. Kaynaklar bölümüne eklediğim linklerden regular expression hakkında daha detaylı bilgi alabilirsiniz.

Şöyle bir verimiz olsun.

db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )db.myColl.insertMany([
   { _id: 1, category: "café" },
   { _id: 2, category: "cafe" },
   { _id: 3, category: "cafE" }
])

$regexFind: Arama sonucundaki bulunan ilk dokümanı döndürür.

db.myColl.aggregate(
   [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ }  } } } ],
   { collation: { locale: "fr", strength: 1 } }           // Ignored in the $regexFind
)// sonuç{ "_id" : 1, "category" : "café", "resultObject" : null }
{ "_id" : 2, "category" : "cafe", "resultObject" : { "match" : "cafe", "idx" : 0, "captures" : [ ] } }
{ "_id" : 3, "category" : "cafE", "resultObject" : null }

$regexFindAll: Arama sonucundaki bütün dokümanları array olarak döndürür.

db.myColl.aggregate(
   [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ }  } } } ],
   { collation: { locale: "fr", strength: 1 } }       // Ignored in the $regexFindAll
)// sonuç{ "_id" : 1, "category" : "café", "results" : [ ] }
{ "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] }
{ "_id" : 3, "category" : "cafE", "results" : [ ] }

$regexMatch: Arama sonucunu true/false olarak döndürür.

db.myColl.aggregate(
   [ { $addFields: { results: { $regexMatch: { input: "$category", regex: /cafe/ }  } } } ],
   { collation: { locale: "fr", strength: 1 } }       // Ignored in the $regexMatch
)// sonuç{ "_id" : 1, "category" : "café", "results" : false }
{ "_id" : 2, "category" : "cafe", "results" : true }
{ "_id" : 3, "category" : "cafE", "results" : false }

Aggregation Pipeline Optimization

Projection Optimization: Sorugu sonrasında gerekli alanların döndürülmesini ifade eder. Gerekliden kastımız bazı alanların hiç getirilmemesi gerekirse bazı alanların eklenmesini ifade eder. Konu başlığını dokümanlarda şu sayfada bulabilirsiniz.

($project or $unset or $addFields or $set) operatörlerinin kullanıldığı stage’in ardından $match stage ile devam eden sorguların optimizasyonu.

Örneğin alttaki sorguyu inceleyelim

// gerekli gereksiz bütün dokümanalra maxTime ve minTime diye iki alan akleniyor
{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },// daha sonra gerekli gereksiz bütün dokümanlar için avgTime hesaplaması yapılıyor 
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },// fazladna yapılan hesaplamalardan sonra filtre stage i çalıştırılıyor.
{ $match: {
    name: "Joe Schmoe",
    maxTime: { $lt: 20 },
    minTime: { $gt: 5 },
    avgTime: { $gt: 7 }
} }

Bunun yerine sorgunun şöyle yazıldığını düşünelim

// sadece eşleşen dokümanlar alınıyor.
{ $match: { name: "Joe Schmoe" } },// sadece eşleşen dokümanlar için minTime ve maxTime alanmları ekleniyor
{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },// sadece eşleşen dkümanlarda hesaplanan min ve max değerleri filtreleniyor
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },// sadece uygun kişi, min ve max değerlerine göre avg hesaplanıyor 
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },// ve gerekli eşleşmelere göre hesalanmış avg değerinden ihtiyaç olanlar filtreleniyor.
{ $match: { avgTime: { $gt: 7 } } }

Yani böylece her bir stage’de üzerinde çalışacağımız veri kümesini aazaltmış oluyoruz.

$sort ve ardından $match stage’i gelen sorguların optimizasyonu

{ $sort: { age : -1 } },
{ $match: { status: 'A' } }

Sorguda gereksiz yere önce bütün dokümanlar sıralanmış daha sonra filtreleme yapılmış. Aşağıdaki gibi yazılsaydı öncelikle gereksiz dokümanlar filtrelenmiş olacaktı.

{ $match: { status: 'A' } },
{ $sort: { age : -1 } }

$redact ardından $match stage gelen sorguların optimizasyonu

$redact operatörü yazacağımız condition’a göre ilgili alanı filtreler. $$DESCEND alanı döndürür, $$PRUNE alanı iptal eder.

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

Üstteki sorguda gereksiz dokümanlar üzerinde redact operatörü çalıştırılmış. Altaki düzenleme ile daha en başka gereksiz dokümanlar filtrelenmiş oldu.

{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

Son olarak MongoDB resmi sayfasında çok detaylı anlatım olan ve best practice’leride içeren “Practical MongoDB Aggregation (Ebook)” dokümanını kesinlikle okuyunuz.

Makale serisinin diğer yazıları için alttaki linkleri kullanabilirsiniz.

Kaynaklar

Originally published at https://github.com.

MongoDB’de Aggregation Pipeline Operation Kullanımı ve Sorgu Optimizasyonu (MongoDB Öğreniyoruz 7)

Regular Expression ile İlgili Aggregation Operator’leri

Aggregation Pipeline Optimization

Kaynaklar

Written by Murat Çabuk