- The Devo data analytics platform
- Getting started
- Domain administration
-
Sending data to Devo
-
The Devo In-House Relay
- Installing the Devo Relay
- Configuring the In-House Relay
- Relay migration
- Sending SSL/TLS encrypted events to the Devo relay
- Relay troubleshooting tips (v1.4.2)
-
Event sources
- Unix-like machines
- Windows
- MacOS X
- Cloud services
- Commercial products
- Custom apps
-
Universal Agent
- Deployment scenarios
- Pre-integrated query packs
- Data querying in Devo
-
Universal Agent Manager deployment
- Generic deployment guidelines
- Universal Agent Manager - CentOS 7 Deployment
- Universal Agent Manager - CentOS 8 Deployment
- Universal Agent Manager - Debian 9 Deployment
- Universal Agent Manager - Debian 10 Deployment
- Universal Agent Manager - RHEL 7 Deployment
- Universal Agent Manager - RHEL 8 Deployment
- Universal Agent Manager - Ubuntu 18 Deployment
- Universal Agent deployment
- Universal Agent Manager user manual
- Operational guidelines
- Performance considerations
- Other data collection methods
- Uploading log files
- Devo software
-
The Devo In-House Relay
-
Parsers and collectors
- About Devo tags
- Special Devo tags and data tables
-
List of Devo parsers
- Business & Consumer
- Cloud technologies
- Databases
- Host and Operating Systems
-
Network and application security
- auth.secureauth
- auth.securenvoy
- av.mcafee
- av.sophos
- box.iptables
- edr.cylance
- edr.fireeye.alerts
- edr.minervalabs.events
- edr.paloalto
- endpoint.symantec
- firewall.checkpoint
- firewall.cisco firepower and vpn.cisco
- firewall.fortinet
- firewall.huawei
- firewall.juniper
- firewall.paloalto
- firewall.pfsense
- firewall.sonicwall
- firewall.sophos
- firewall.sophos.xgfirewall
- firewall.stonegate
- firewall.windows
- ids.extrahop
- mail.proofpoint
- nac.aruba
- network.meraki
- network.versa
- network.vmware
- proxy.bluecoat
- proxy.forcepoint
- proxy.squid
- uba.varonis
- vuln.beyondtrust
- vpn.pulsesecure.sa
- Network connectivity
- Web servers
- Technologies supported in CEF syslog format
- Collectors
-
Searching data
- Accessing data tables
-
Building a query
- Data types in Devo
- Build a query in the search window
- Build a query using LINQ
- Working with JSON objects in data tables
- Subqueries
-
Operations reference
-
Aggregation operations
- Average (avg)
- Count (count)
- First (first)
- First not null (nnfirst)
- HyperLogLog++ (hllpp)
- HyperLogLog++ Count Estimation (hllppcount)
- Last (last)
- Last not null (nnlast)
- Maximum (max)
- Median / 2nd quartile / Percentile 50 (median)
- Minimum (min)
- Non-null average (nnavg)
- Non-null standard deviation (biased) (nnstddev)
- Non-null standard deviation (unbiased) (nnustddev)
- Non-null variance (biased) (nnvar)
- Non-null variance (unbiased) (nnuvar)
- Percentile 10 (percentile10)
- Percentile 25 / 1st quartile (percentile25)
- Percentile 5 (percentile5)
- Percentile 75 / 3rd quartile (percentile75)
- Percentile 90 (percentile90)
- Percentile 95 (percentile95)
- Standard deviation (biased) (stddev)
- Standard deviation (unbiased) (ustddev)
- Sum (sum)
- Sum Square (sum2)
- Variance (biased) (var)
- Variance (unbiased) (uvar)
-
Arithmetic group
- Absolute value (abs)
- Addition, sum, plus / Concatenation (add, +)
- Ceiling (ceil)
- Cube root (cbrt)
- Division (div, \)
- Division remainder (rem, %)
- Floor (floor)
- Modulo (mod, %%)
- Multiplication, product (mul, *)
- Power (pow)
- Real division (rdiv, /)
- Rounding (round)
- Sign (signum)
- Square root (sqrt)
- Subtraction, minus / Additive inverse (sub, -)
-
Conversion group
- Duration (duration)
- Format date (formatdate)
- From base16, b16, hex (from16)
- From base64, b64 (from64)
- From UTF8 (fromutf8)
- From Z85, base85 (fromz85)
- Human size (humanSize)
- Make byte array (mkboxar)
- Parse date (parsedate)
- Regular expression, regexp (re)
- Template (template)
- Timestamp (timestamp)
- To base16, b16, hex (to16)
- To base64, b64, hex (to64)
- To BigInt (bigint)
- To boolean (bool)
- To Float (float)
- To image (image)
- To Int (int)
- To IPv4 (ip4)
- To IPv4 net (net4)
- To IPv6 (ip6)
- To IPv6 compatible (compatible)
- To IPv6 mapped (mapped)
- To IPv6 net (net6)
- To IPv6 translated (translated)
- To MAC address (mac)
- To string (str)
- To string (stringify)
- To UTF8 (toutf8)
- To Z85, base85 (toz85)
- Cryptography group
- Date group
- Flow group
- General group
-
Geolocation group
- Coordinates distance (distance)
- Geocoord (geocoord)
- Geographic coordinate system (coordsystem)
- Geohash (geohash)
- Geohash string (geohashstr)
- Geolocated Accuracy Radius with MaxMind GeoIP2 (mm2accuracyradius)
- Geolocated ASN (mmasn)
- Geolocated ASN with MaxMind GeoIP2 (mm2asn)
- Geolocated AS Organization Name with MaxMind GeoIP2 (mm2asorg)
- Geolocated AS owner (mmasowner)
- Geolocated City (mmcity)
- Geolocated City with MaxMind GeoIP2 (mm2city)
- Geolocated Connection Speed (mmspeed)
- Geolocated connection type with MaxMind GeoIP2 (mm2con)
- Geolocated Coordinates (mmcoordinates)
- Geolocated coordinates with MaxMind GeoIP2 (mm2coordinates)
- Geolocated Country (mmcountry)
- Geolocated Country with MaxMind GeoIP2 (mm2country)
- Geolocated ISP (mmisp)
- Geolocated ISP name with MaxMind GeoIP2 (mm2isp)
- Geolocated Latitude (mmlatitude)
- Geolocated Latitude with MaxMind GeoIP2 (mm2latitude)
- Geolocated Level 1 Subdivision with MaxMind GeoIP2 (mm2subdivision1)
- Geolocated Level 2 Subdivision with MaxMind GeoIP2 (mm2subdivision2)
- Geolocated Longitude (mmlongitude)
- Geolocated Longitude with MaxMind GeoIP2 (mm2longitude)
- Geolocated Organization (mmorg)
- Geolocated organization name with MaxMind GeoIP2 (mm2org)
- Geolocated Postal Code (mmpostalcode)
- Geolocated Postal Code with MaxMind GeoIP2 (mm2postalcode)
- Geolocated Region (mmregion)
- Geolocated Region Name (mmregionname)
- ISO-3166-1 Continent Alpha-2 Code (continentalpha2)
- ISO-3166-1 Continent Name (continentname)
- ISO-3166-1 Country Alpha-2 Code (countryalpha2)
- ISO-3166-1 Country Alpha-2 Continent (countrycontinent)
- ISO-3166-1 Country Alpha-3 Code (countryalpha3)
- ISO-3166-1 Country Latitude (countrylatitude)
- ISO-3166-1 Country Longitude (countrylongitude)
- ISO-3166-1 Country Name (countryname)
- Latitude (latitude)
- Latitude and longitude coordinates (latlon)
- Longitude (longitude)
- Parse geocoord format (parsegeo)
- Represent geocoord format (reprgeo)
- Round coordinates (gridlatlon)
- JSON group
- Logic group
-
Mathematical group
- Arc cosine (acos)
- Arc sine (asin)
- Arc tangent (atan)
- Bitwise AND (band, &)
- Bitwise left shift (lshift, <<)
- Bitwise NOT (bnot, ~)
- Bitwise OR (bor, |)
- Bitwise right shift (rshift, >>)
- Bitwise unsigned right shift (urshift, >>>)
- Bitwise XOR (bxor, ^)
- Cosine (cos)
- e (mathematical constant) (e)
- Exponential: base e (exp)
- Hyperbolic cosine (cosh)
- Hyperbolic sine (sinh)
- Hyperbolic tangent (tanh)
- Logarithm: base 2 (log2)
- Logarithm: base 10 (log10)
- Logarithm: natural / arbitrary base (log)
- Pi (mathematical constant) (pi)
- Sine (sin)
- Tangent (tan)
- Meta Analysis group
- Name group
-
Network group
- HTTP Status Description (httpstatusdescription)
- HTTP Status Type (httpstatustype)
- IP Protocol (ipprotocol)
- IP Reputation Score (reputationscore)
- IP Reputation Tags (reputation)
- IPv4 legal use (purpose)
- IPv6 host number (host)
- IPv6 routing number (routing)
- Is IPv4 (ipip4)
- Is Private IPv4 (isprivate)
- Is Public IPv4 (ispublic)
- Squid Black Lists Flags (sbl)
- Order group
-
Packet group
- Ethernet destination MAC address (etherdst)
- Ethernet payload (etherpayload)
- Ethernet source MAC address (ethersrc)
- Ethernet status (etherstatus)
- Ethernet tag (ethertag)
- EtherType (ethertype)
- Has Ethernet frame (hasether)
- Has IPv4 datagram (hasip4)
- Has TCP segment (hastcp)
- Has UDP datagram (hasudp)
- IPv4 destination address (ip4dst)
- IPv4 differentiated services (ip4ds)
- IPv4 explicit congestion notification (ip4ecn)
- IPv4 flags (ip4flags)
- IPv4 fragment offset (ip4fragment)
- IPv4 header checksum (ip4cs)
- IPv4 header length (ip4hl)
- IPv4 identification (ip4ident)
- IPv4 payload (ip4payload)
- IPv4 protocol (ip4proto)
- IPv4 source address (ip4src)
- IPv4 status (ip4status)
- IPv4 time to live (ip4ttl)
- IPv4 total length (ip4len)
- IPv4 type of service (ip4tos)
- TCP ACK (tcpack)
- TCP checksum (tcpcs)
- TCP destination port (tcpdst)
- TCP flags (tcpflags)
- TCP header length (tcphl)
- TCP payload (tcppayload)
- TCP sequence number (tcpseq)
- TCP source port (tcpsrc)
- TCP status (tcpstatus)
- TCP urgent pointer (tcpurg)
- TCP window size (tcpwin)
- UDP checksum (udpcs)
- UDP destination port (udpdst)
- UDP length (udplen)
- UDP payload (udppayload)
- UDP source port (udpsrc)
- UDP status (udpstatus)
- Statistical group
-
String group
- Contains (has, ->)
- Contains - case insensitive (weakhas)
- Contains tokens (toktains)
- Contains tokens - case insensitive (weaktoktains)
- Edit distance: Damerau (damerau)
- Edit distance: Hamming (hamming)
- Edit distance: Levenshtein (levenshtein)
- Edit distance: OSA (osa)
- Ends with (endswith)
- Format number (formatnumber)
- Hostname public suffix (publicsuffix)
- Hostname root domain (rootdomain)
- Hostname root prefix (rootprefix)
- Hostname root suffix (rootsuffix)
- Hostname subdomains (subdomain)
- Hostname top level domain (topleveldomain)
- Is empty (isempty)
- Is in (`in`, <-)
- Is in - case insensitive (weakin)
- Length (length)
- Locate (locate)
- Lower case (lower)
- Matches (matches, ~)
- Peek (peek)
- Replace all (replaceall)
- Replace first (replace)
- Shannon entropy (shannonentropy)
- Split (split)
- Split regexp (splitre)
- Starts with (startswith)
- Substitute (subs)
- Substitute all (subsall)
- Substring (substring)
- Trim both sides (trim)
- Trim the left side (ltrim)
- Trim the right side (rtrim)
- Upper case (upper)
-
Web group
- Absolute URI (absoluteuri)
- Opaque URI (opaqueuri)
- URI authority (uriauthority)
- URI fragment (urifragment)
- URI host (urihost)
- URI path (uripath)
- URI port (uriport)
- URI query (uriquery)
- URI scheme (urischeme)
- URI ssp (urissp)
- URI user (uriuser)
- URL decode (urldecode)
- User Agent Company (uacompany)
- User Agent Company URL (uacompanyurl)
- User Agent Device Icon (uadeviceicon)
- User Agent Device Information URL (uadeviceinfourl)
- User Agent Device Type (uadevicetype)
- User Agent Family (uafamily)
- User Agent Icon (uaicon)
- User Agent Information URL (uainfourl)
- User Agent is Robot (uaisrobot)
- User Agent Name (uaname)
- User Agent OS Company (uaoscompany)
- User Agent OS Company URL (uaoscompanyurl)
- User Agent OS Family (uaosfamily)
- User Agent OS Icon (uaosicon)
- User Agent OS Name (uaosname)
- User Agent OS URL (uaosurl)
- User Agent Type (uatype)
- User Agent URL (uaurl)
- User Agent Version (uaversion)
-
Aggregation operations
-
Working in the search window
-
Generate charts
- Affinity chord diagram
- Availability timeline
- Bipartite chord diagram
- Bubble chart
- Chart aggregation
- Custom date chart aggregation
- Flame graph
- Flat world map by coordinates
- Flat world map by country
- Google animated heat map
- Google area map
- Google heat map
- Graph diagram
- Histogram
- Pew Pew map
- Pie chart
- Pie layered chart
- Punch card
- Robust Random Cut Forest chart
- Sankey diagram
- Scatter plot
- Time heatmap
- Triple exponential chart
- Voronoi treemap
- Data enrichment
- Setting up a data table
- Advanced data operations
- Use case: eCommerce behavior analysis
-
Generate charts
- Managing your queries
- Best practices for data search
- Monitoring tables
- Activeboards
-
Dashboards
- Create a new dashboard
-
Working with dashboard widgets
- Availability timeline widget
- Chord diagram widget
- Circle world map widget
- Color key value widget
- Color world map widget
- Column chart widget
- Comparative chart widget
- Funnel widget
- Gauge meter widget
- Google heatmap widget
- Heat calendar widget
- Line chart widget
- Monitoring widget
- Pie chart widget
- Punch card widget
- Sectored pie chart widget
- Table widget
- Time heatmap widget
- Tree diagram widget
- Voronoi tree widget
- Configuring and sharing dashboards
- Alerts and notifications
- Panels
- Applications
- Tools
- Flow
- Social Intelligence
- API reference
- Release notes
Robust Random Cut Forest chart
Overview
During the training phase, a forest of trees that represent normal behavior is created. When a new point arrives it is inserted into the trees creating a distortion inside them. The amount of reorganization needed in order to stabilize the tree is translated into an anomaly score called codisplacement. Anomalies are produced by defining a threshold on the anomaly score.
When should I use this chart?
This chart is less affected by data periodicity than the Triple exponential chart, so it is more versatile. This algorithm used by the Robust Random Cut Forest chart is suitable for data flows that don't necessarily have a constant period.
What data do I need for this chart?
Firstly, it is important to note that this chart is meant to be used with time series, that is to say, numerical univariate data flows that must have a constant time step and an associated timestamp.
The option to create this chart will be disabled unless your query contains a temporal grouping with no columns added as arguments. Furthermore, your query must contain a numeric column (for example, you can aggregate your data and add a count).
For example, this is a correct query that would enable the Robust Random Cut Forest chart option in the search window:
from demo.ecommerce.data
group every 1h
select count() as count
Creating a robust random cut forest chart
Go to Data Search and open the required table.
Perform the required operations to get the data you want to use in the chart.
After getting the required query results, go to Additional tools → Charts → Anomaly Detection → Robust Random Cut Forest.
Drag the required columns to their corresponding fields. This chart requires you to select the following fields:
Required field
Description
Data type
Value Add the numeric column whose values you want to analyze. float, integer or duration The Robust Random Cut Forest chart is displayed.
Working with Robust Random Cut Forest charts
The data analysis performed by this chart can be divided into two phases: training and evaluation.
The points used to train are those included under the green band, which can be modified by dragging the band or using the options at the top of the chart, as explained below. Everything that is not part of the training is the part that will be evaluated by the algorithm. After training part of the data in a specified series, the chart will predict potential anomalies and will indicate them as red points.
It is advisable, as far as possible, to avoid including visible anomalous points in the training set.
After selecting the required period, click the Train button to get the results.
You can configure the following options at the top of the chart window:
Shingle | Corresponds to the number of points of the sliding window. The sliding window is used to convert a univariate variable to a multivariate one. The default value is 1. This parameter controls how fast the algorithm is going to adapt and learn changes in the incoming data flow. A low shingle (short sliding window) means that the algorithm is more flexible to changes, forgets older data faster, and tends to learn changes quickly. On the other hand, a high shingle value will not adapt fast to changes and will remember old behavior longer. One needs to find the right balance between flexibility on detecting data changes (anomalies) and the undesired tendency to learn anomalies too quickly and thus not detect them as such in the future. The shingle value is somehow equivalent to the notion of the period and it is often chosen as a multiple of the period. |
---|---|
Number of trees | A higher number of trees leads to better estimation results. However, the improvement decreases as the number of trees increases, in other words, at a certain point the benefit in prediction performance from learning more trees will be lower than the cost in computation time for learning these additional trees. The default value is 40. |
Tree size | A higher tree size improves the capacity to learn more about the data. As with the Number of trees parameter, at a certain point, the cost in computation time will increase considerably. The default value is 256. |
Time decay | Expected age at which a random sample point should expire and be replaced by a tree. It is used to calculate the internal decay factor, which is 1/time decay. It must be greater than or equal to the tree size. The default value is 256. |
Inizialite points | This value can also be edited by dragging the light green band. The default value is 100. Keep in mind that you may not get proper results if you enter a low value in this parameter since the algorithm would not be able to learn enough. However, indicating the widest possible range may also be dangerous, since it may cause overfitting. |
Threshold | Limit on the algorithm score from which the points are considered anomalies. After training the model, you can update the threshold by clicking the Update threshold button next to this option or by dragging the red horizontal line. The default value is 10. The threshold is a constant value that decides how strict do you want your system to be with the anomalies it produces. Increasing the threshold will make the chart detect only very clear anomalies. This is useful in situations where only extremely weird data points are of importance and ambiguous ones can be discarded. By contrast, if the threshold is set to a low value, every point that differs slightly from the normality will be an anomaly. This is often the case when tiny variations are of extreme importance (for example, imagine anomalies on a certain quantity that causes a deathly disease). In this case, even though many false positives will be produced, one would prefer to catch every point that is different in any aspect. This quantity is different in every series, so that's why a deep understanding of the underlying problem is required before setting this value. |
You can zoom in to specific parts of the series by clicking a point in the top chart and dragging to the required ending point. You can also use the sliders in the bottom chart to specify the required part of the series. To go back to the default zoom, return the sliders to the beginning and end of the bottom chart or click the All button in the Zoom area.
Handling missing points
The robust random cut forest chart needs a regular and uninterrupted flow of data points in order to work properly, so missing values need special handling to make the chart work. There are two possible causes for missing points:
There are events that don't exist
You will not be able to train your model if the data series contains holes due to non-existing events. In this case, the chart would try to interpolate those missing points. The interpolation takes into account the average of n previous points to allow working in real-time. When interpolation occurs, gaps are filled with purple dots to indicate that you are visualizing generated values.
The maximum number of consecutive missing points to be interpolated is 5. If this value is exceeded, you will not be able to train the model. An error will appear when clicking the Train button and holes will be marked in the chart with pinkish bands.
It is advisable to keep the lowest possible rate of points to be interpolated in order to get the most out of the chart. Keep in mind that interpolated points are simply a prediction generated by the chart, and are not real.
This problem may be solved by regrouping your query using a greater grouping period.
The data is not fully downloaded
Interpolation only works with events that don't exist. The chart will never interpolate values from data yet to be loaded. These areas are represented on the chart as gray bands. In this case, the chart will only evaluate up to the first gap.
There must be enough events for at least training and evaluating one point before the gap starts, otherwise, you will be notified. Click the Download more button in the warning message that appears to download the data required for the widget to train, or do it manually activating the Load all events option in the Event loading indicator of the search window.
If you wish to fill certain areas where gaps are located, you can do so by clicking on the event timeline at the top of the search window. Learn more about loading data in the search window here.
Labels
- latest