|
1 |
| -[[aggregations]] |
2 |
| -= Aggregations |
3 |
| - |
4 |
| -[partintro] |
5 |
| --- |
6 |
| -Until this point, this book has been dedicated to search.((("searching", "search versus aggregations")))((("aggregations"))) With search, |
7 |
| -we have a query and we want to find a subset of documents that |
8 |
| -match the query. We are looking for the proverbial needle(s) in the |
9 |
| -haystack. |
10 |
| - |
11 |
| -With aggregations, we zoom out to get an overview of our data. Instead of |
12 |
| -looking for individual documents, we want to analyze and summarize our complete |
13 |
| -set of data: |
14 |
| - |
15 |
| -// Popular manufacturers? Unusual clumps of needles in the haystack? |
16 |
| -- How many needles are in the haystack? |
17 |
| -- What is the average length of the needles? |
18 |
| -- What is the median length of the needles, broken down by manufacturer? |
19 |
| -- How many needles were added to the haystack each month? |
20 |
| - |
21 |
| -Aggregations can answer more subtle questions too: |
22 |
| - |
23 |
| -- What are your most popular needle manufacturers? |
24 |
| -- Are there any unusual or anomalous clumps of needles? |
25 |
| - |
26 |
| -Aggregations allow us to ask sophisticated questions of our data. And yet, while |
27 |
| -the functionality is completely different from search, it leverages the |
28 |
| -same data-structures. This means aggregations execute quickly and are |
29 |
| -_near real-time_, just like search. |
30 |
| - |
31 |
| -This is extremely powerful for reporting and dashboards. Instead of performing |
32 |
| -_rollups_ of your data (_that crusty Hadoop job that takes a week to run_), |
33 |
| -you can visualize your data in real time, allowing you to respond immediately. |
34 |
| - |
35 |
| -// Perhaps mention "not precalculated, out of date, and irrelevant"? |
36 |
| -// Perhaps "aggs are calculated in the context of the user's search, so you're not showing them that you have 10 4 star hotels on your site, but that you have 10 4 star hotels that *match their criteria*". |
37 |
| - |
38 |
| -Finally, aggregations operate alongside search requests.((("aggregations", "operating alongside search requests"))) This means you can |
39 |
| -both search/filter documents _and_ perform analytics at the same time, on the |
40 |
| -same data, in a single request. And because aggregations are calculated in the |
41 |
| -context of a user's search, you're not just displaying a count of four-star hotels--you're displaying a count of four-star hotels that _match their search criteria_. |
42 |
| - |
43 |
| -Aggregations are so powerful that many companies have built large Elasticsearch |
44 |
| -clusters solely for analytics. |
45 |
| --- |
| 1 | +ifndef::es_build[= placeholder3] |
| 2 | + |
| 3 | +[[aggregations]] |
| 4 | += Aggregations |
| 5 | + |
| 6 | +[partintro] |
| 7 | +-- |
| 8 | +Until this point, this book has been dedicated to search.((("searching", "search versus aggregations")))((("aggregations"))) With search, |
| 9 | +we have a query and we want to find a subset of documents that |
| 10 | +match the query. We are looking for the proverbial needle(s) in the |
| 11 | +haystack. |
| 12 | + |
| 13 | +With aggregations, we zoom out to get an overview of our data. Instead of |
| 14 | +looking for individual documents, we want to analyze and summarize our complete |
| 15 | +set of data: |
| 16 | + |
| 17 | +// Popular manufacturers? Unusual clumps of needles in the haystack? |
| 18 | +- How many needles are in the haystack? |
| 19 | +- What is the average length of the needles? |
| 20 | +- What is the median length of the needles, broken down by manufacturer? |
| 21 | +- How many needles were added to the haystack each month? |
| 22 | + |
| 23 | +Aggregations can answer more subtle questions too: |
| 24 | + |
| 25 | +- What are your most popular needle manufacturers? |
| 26 | +- Are there any unusual or anomalous clumps of needles? |
| 27 | + |
| 28 | +Aggregations allow us to ask sophisticated questions of our data. And yet, while |
| 29 | +the functionality is completely different from search, it leverages the |
| 30 | +same data-structures. This means aggregations execute quickly and are |
| 31 | +_near real-time_, just like search. |
| 32 | + |
| 33 | +This is extremely powerful for reporting and dashboards. Instead of performing |
| 34 | +_rollups_ of your data (_that crusty Hadoop job that takes a week to run_), |
| 35 | +you can visualize your data in real time, allowing you to respond immediately. |
| 36 | + |
| 37 | +// Perhaps mention "not precalculated, out of date, and irrelevant"? |
| 38 | +// Perhaps "aggs are calculated in the context of the user's search, so you're not showing them that you have 10 4 star hotels on your site, but that you have 10 4 star hotels that *match their criteria*". |
| 39 | + |
| 40 | +Finally, aggregations operate alongside search requests.((("aggregations", "operating alongside search requests"))) This means you can |
| 41 | +both search/filter documents _and_ perform analytics at the same time, on the |
| 42 | +same data, in a single request. And because aggregations are calculated in the |
| 43 | +context of a user's search, you're not just displaying a count of four-star hotels--you're displaying a count of four-star hotels that _match their search criteria_. |
| 44 | + |
| 45 | +Aggregations are so powerful that many companies have built large Elasticsearch |
| 46 | +clusters solely for analytics. |
| 47 | +-- |
| 48 | + |
| 49 | +include::301_Aggregation_Overview.asciidoc[] |
| 50 | + |
| 51 | +include::302_Example_Walkthrough.asciidoc[] |
| 52 | + |
| 53 | +include::303_Making_Graphs.asciidoc[] |
| 54 | + |
| 55 | +include::304_Approximate_Aggregations.asciidoc[] |
| 56 | + |
| 57 | +include::305_Significant_Terms.asciidoc[] |
| 58 | + |
| 59 | +include::306_Practical_Considerations.asciidoc[] |
| 60 | + |
0 commit comments