OHLC data grouping with mongodb

In this post I will demonstrate how to do data grouping with OHLC data using mongodb’s powerful aggregation framework.

The problem:

Group OHLC data every N weeks where N>1, I should point out that doing weekly data grouping (when N==1) is a whole lot easier.

Sample Data:

mongodb_aggregation_data.jsview raw
1
2
3
4
5
6
7
8
9
10
11
/* Data based on http://finance.yahoo.com/q/hp?s=QQQ+Historical+Prices */
> db.ohlc.find()
{ "_id" : ObjectId("54d65597daf0910dfa816995"), "S" : "QQQ", "D" : ISODate("2015-02-06T00:00:00Z"), "O" : 103.92, "H" : 104.17, "L" : 102.76, "C" : 103.13, "V" : 32833800, "A" : 103.13 }
{ "_id" : ObjectId("54d65597daf0910dfa816996"), "S" : "QQQ", "D" : ISODate("2015-02-05T00:00:00Z"), "O" : 103.13, "H" : 103.83, "L" : 102.87, "C" : 103.76, "V" : 23605500, "A" : 103.76 }
{ "_id" : ObjectId("54d65597daf0910dfa816997"), "S" : "QQQ", "D" : ISODate("2015-02-04T00:00:00Z"), "O" : 102.54, "H" : 103.55, "L" : 102.31, "C" : 102.87, "V" : 34073200, "A" : 102.87 }
{ "_id" : ObjectId("54d65597daf0910dfa816998"), "S" : "QQQ", "D" : ISODate("2015-02-03T00:00:00Z"), "O" : 102.33, "H" : 103.03, "L" : 101.68, "C" : 102.96, "V" : 30750400, "A" : 102.96 }
{ "_id" : ObjectId("54d65597daf0910dfa816999"), "S" : "QQQ", "D" : ISODate("2015-02-02T00:00:00Z"), "O" : 101.33, "H" : 102.07, "L" : 99.75, "C" : 101.98, "V" : 43624700, "A" : 101.98 }
{ "_id" : ObjectId("54d65597daf0910dfa81699a"), "S" : "QQQ", "D" : ISODate("2015-01-30T00:00:00Z"), "O" : 101.8, "H" : 102.58, "L" : 100.96, "C" : 101.1, "V" : 42927600, "A" : 101.1 }
{ "_id" : ObjectId("54d65597daf0910dfa81699b"), "S" : "QQQ", "D" : ISODate("2015-01-29T00:00:00Z"), "O" : 100.83, "H" : 102.08, "L" : 99.96, "C" : 101.89, "V" : 46539700, "A" : 101.89 }
{ "_id" : ObjectId("54d65597daf0910dfa81699c"), "S" : "QQQ", "D" : ISODate("2015-01-28T00:00:00Z"), "O" : 103.09, "H" : 103.18, "L" : 100.9, "C" : 100.92, "V" : 43591700, "A" : 100.92 }
/* ... */

The solution:

mongodb_aggregation.jsview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
db.ohlc.aggregate({
$match: {
S: 'QQQ'
}
}, {
$project: {
D: '$D', O: '$O', H: '$H', L: '$H', C: '$C', V: '$V', A: '$A',
weeknbr: {
$divide: [{
$subtract: [
'$D',
new ISODate('1970-01-04')
]
},
86400 * 7000
]
}
}
}, {
$project: {
D: '$D', O: '$O', H: '$H', L: '$H', C: '$C', V: '$V', A: '$A',
rnd_weeknbr: {
$subtract: [
'$weeknbr', {
$mod: [
'$weeknbr',
1
]
}
]
}
}
}, {
$project: {
D: '$D', O: '$O', H: '$H', L: '$H', C: '$C', V: '$V', A: '$A',
grp_weeknbr: {
$subtract: [
'$rnd_weeknbr', {
$mod: [
'$rnd_weeknbr',
4
]
}
]
}
}
}, {
$sort: {
D: 1
}
},

{
$group: {
_id: {
grp_weeknbr: '$grp_weeknbr'
},
D: {
$last: '$D'
},
O: {
$first: '$O'
},
H: {
$max: '$H'
},
L: {
$min: '$L'
},
C: {
$last: '$C'
},
A: {
$last: '$A'
},
V: {
$sum: '$V'
}
}
}, {
$sort: {
D: 1
}
}
)

The explanation:

The idea is to

  1. get the number of weeks (floating point) since a reference date (line #8-17, all OHLC data in the db are later than that date), the reason 1970-01-04 instead of 1970-01-01 (which is Wednesday)
    is chosen is because it lands on Sunday.
  2. get the floored number of weeks for step 2, line #22-31, since mongodb doesn’t have a round or floor method, this is the only way to get number of week since reference date in integer.
  3. sort by D, this step is crucial as the next step will need to use method such as $last and $first to get the close and open price for the grouped period.
  4. group by 4 weeks’ OHLC, line #37-40, number 4 on line #41 can be substituted based on data group unit.
  5. sort by _id, which consists of the calculated grouping number grp_weeknbr from previous step.

The reason why the number of weeks needs to be calculated based off a reference date is because of the partial week problem. See this SO question that I asked before I came up with the solution documented in this post.

Result:

mongodb_aggregation_result.jsview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/* 0 */
{
"result" : [
{
"_id" : {
"grp_weeknbr" : 1520
},
"D" : ISODate("1999-03-19T00:00:00.000Z"),
"O" : 102.25,
"H" : 106.5,
"L" : 102.31,
"C" : 102.44,
"A" : 46.98,
"V" : 50912800
},
{
"_id" : {
"grp_weeknbr" : 1524
},
"D" : ISODate("1999-04-16T00:00:00.000Z"),
"O" : 102.88,
"H" : 112.5,
"L" : 101,
"C" : 103.94,
"A" : 47.67,
"V" : 182968000
},
/* ... */
{
"_id" : {
"grp_weeknbr" : 2344
},
"D" : ISODate("2015-01-02T00:00:00.000Z"),
"O" : 105.11,
"H" : 105.57,
"L" : 102.09,
"C" : 102.94,
"A" : 102.94,
"V" : 695244900
},
{
"_id" : {
"grp_weeknbr" : 2348
},
"D" : ISODate("2015-01-30T00:00:00.000Z"),
"O" : 102.5,
"H" : 104.58,
"L" : 100.95,
"C" : 101.1,
"A" : 101.1,
"V" : 794977000
},
{
"_id" : {
"grp_weeknbr" : 2352
},
"D" : ISODate("2015-02-06T00:00:00.000Z"),
"O" : 101.33,
"H" : 104.17,
"L" : 102.07,
"C" : 103.13,
"A" : 103.13,
"V" : 164887600
}
],
"ok" : 1
}

Note: Last group (2015-02-06) contains only one-week (first week of the group) worth of data.