Best practices for data search
We've collected a number of tips to help you optimize performance and get the most out of what Devo data search offers.
When dealing with large amounts of data, you need to consider the browser's memory restrictions and the processing requirements of different query operations. We have several recommendations to make sure you get the best possible performance:
- Switch to server mode
- Manage browser memory
- Consider cardinality when grouping events
- Reduce the number of columns in a table
Switch to server mode
When you access the search window after running a search, your browser is in charge of processing some of your query operations in addition to the server. In order to prevent browser exhaustion, switching to server mode is recommended when dealing with queries that require computationally heavy calculations. For the same reason, it is also recommended before grouping your data when dealing with tables that contain columns with a very large number of distinct values (that is, high cardinality and/or variability). In cases other than the aforementioned, it is recommended to continue running queries using the default search window mode.
Check the Server mode box in the search window toolbar to switch to server mode.
Users can also set server mode as default by going to Preferences → User preferences and checking the Data Search server mode box.
After grouping your data, note that certain operations (such as geolocation operations and lookups) cannot be applied if you are not in server mode, since they are performed by the server. For this reason, you will not be able to disable server mode after applying one of these operations.
Manage browser memory
Restart your browser to free up memory.
Minimize the number of open tabs to maximize available memory.
Limit concurrent queries
As a general rule, you should minimize the number of concurrent queries in order to maximize available memory.
If you need to have multiple, large queries open, create a second session by opening another browser window in incognito mode to better handle the memory requirements.
Use a brief time range when building a new query
Query-building can sometimes involve quite a bit of trial and error with the operations you apply to the data. So before starting to build and refine your query, apply a briefer time period and make sure that real-time event flow is off. This provides better performance because there will be fewer events to apply the operations to when you apply filters, create columns, and so on. Once you are satisfied with your query, you can set the time range and real-time event flow as you require.
Follow the order of query operations
All filters should be applied early in the query, and certainly before grouping and aggregation. This reduces the memory and computation required for the later operations. The following describes the recommended order of operations:
- Create columns (data enrichment)
- Filters of new columns
Consider cardinality when grouping events
Avoid grouping by fields with a very large number of different values (high cardinality). This can be resource-intensive and produce results that are harder to read and analyze. Here are some tips for grouping by fields with high cardinality:
- Consider applying a filter to the field before grouping to limit the cardinality.
- If the field contains numeric values, enrich the data with a new column that identifies a numeric range to which the event belongs, then group by the numeric range instead of the individual values.
Reduce the number of columns in a table
If you use a Finder to open a data table, you can pre-select only the columns with data that is of interest to you. This reduces the amount of data that your browser needs to load into memory. Here's how.
If you have a query already open in the search window, you can use the Column manager tool to pick the columns you want to show or hide. Here's how.
There are some great tools available in the search window that you might overlook. Here we list a few that can really come in handy.
Toggle Execution Info
Click the gear icon on the toolbar and select Tools → Execution info. This menu gives you useful information about the current query and can tell you:
- How many rows the query has in total, and how many have been loaded so far.
- How much memory is currently being used, and what is the maximum memory you will be able to use.
View selected events
Sometimes it can be visually difficult to examine an event's data, especially when the number of columns necessitates a horizontal scrollbar. This is when this tool comes in really handy.
Just click to select the event or events that you want to examine more particularly, then select the Selected events tool on the toolbar.
This window lists each event's fields and values on its own page so that you can thoroughly examine the events one at a time. The Rich views toggle is activated by default and when activated, correctly reads and displays fields with values in JSON format. Use the Type column to see the type of data of each Event. Image data is shown directly as an image. You can also copy an event's data to paste it elsewhere, or download the event in CSV, JSON, or TXT formats.
Time Interval History
When building and running queries to generate a periodic report, you may be working in multiple data tables concurrently and applying the same time range to each table.
This feature offers you a simple way to apply a recently-used time range to other queries without having to repeatedly use the time range selector. Read more about it here.
Once a query’s events have been grouped, there are some limitations you should keep in mind if you want to apply additional operations.
Applying filters or creating new columns in grouped queries
You will only be able to use aggregations or grouping keys as arguments. Grouping keys are the columns you used to group the events. Other columns in the original table will not be available to use as arguments.
For this reason, and others, we recommend following the order of query operations.
Grouping keys are not available to use as arguments for aggregations. It's also not possible to calculate an aggregation using another aggregation as an argument.
Sparse verses dense
Suppose you have to search for a specific tree in both a sparse forest and a dense forest. The spare forest will have a small number of trees and it will be really easy to spot a special tree. However, the dense forest will have too many trees that could possibly match the one you're looking for; you would have to manually inspect every tree.
This concept directly relates to the frequency of values and the number of events in your searches.
When running a search in Devo, it's best to use sparse terms, that is, a word, a number or a value that is found relatively infrequently.
Ordering of clauses
The order of clauses is important to achieve optimal performance in your queries. See the following example, where 99% of the logs in the table include the term "INFO". This query:
from application.log.log where toktains(raw, "INFO"), service="test"
should be rewritten as follows to achieve better performance:
from application.log.log where toktains(raw, "test"), service="test", toktains(raw, "INFO")
When adding several
where clauses to your query, add most sparse terms first and then the least sparse ones.
Be careful when using the Not operation
Use Not operators as last clauses if you have values you can index on. Otherwise, use them at the beginning of the query if you have to go through every line.
Use logical operators in the proper order
not (A or B) → (not A) and (not B)
not (A and B) → (not A) or (not B)
If you have two or more logical clauses, put parenthesis around what you are after:
A and B or C and D != (A and B) or (C and D)
Always check your logic when you apply logical expressions If there is more than one logical expression, then have each expression in its own parenthesis.