The modern concept of security analytics involves more dimensions than ever. Traditional logs from network devices and hosts must be combined with malware analysis, network traffic analysis, endpoint visibility as well as data provided by threat intelligence feeds. User and entity behavioural analysis using traditional statistical models and modern machine learning has become essential. SIEM detection and correlating functions must be combined with information from historical and forensic analyses. To better understand where does SpectX fit in this versatile landscape let's explore its key qualities.
Contrary to most SIEMs and big data analytics tools, SpectX does not require importing source data into the system before analysis. SpectX runs analytics queries on raw data read from files/blobs (compressed or uncompressed) in their original storage location. No preparation phase such as indexing is required, all data is available for analysis immediately.
Using raw data directly has many benefits:
After the data has been read, SpectX parses the fields according to a specified pattern. In addition to text-based data, a few binary formats are supported, for example, PCAP (TCP/IP traffic capture). Extracted fields are strongly typed (such as integer, string, IP-address, etc) and are exposed to SQL queries as columns in a relational database. The data is parsed every time a query is executed on source data, i.e ETL is performed during query-time. The pattern serves as a virtual structure laid over data temporarily while the query result still exists. This means errors (due to mistakes in a pattern or unexpected changes in the record structure) are also temporary and much easier to correct as no data reloading is needed. Simply correct the pattern and run the query again. The cost of errors is significantly reduced and this, in turn, means that there is much more room for experimentation. Obviously, this will lead to an increase in productivity.
Another matter is detecting situations where data structure errors occur. Leaving them unnoticed may lead to wrong conclusions. Some analysis types (such as trend forecasts) do not require precise data, so a certain amount of missed records or errors could be allowed. The number of unmatched bytes and its relation to total bytes is directly related to the structure of errors. SpectX provides this number with the result of each query, allowing detection of such situations.
The ability to detect and capture data with unexpected structure often marks the line between success and failure in an incident and forensic analysis. Application layer attacks almost always involve maliciously crafted inputs that applications fail to escape. Attackers are often attempting to evade logging or sometimes even try to exploit log analysis tools. In system compromise attacks logs are often partially or fully deleted. Here’s an example on how SpectX handles parsing malicious web requests in the logs of a typosquatting campaign server.
SpectX uses distributed processing technology for handling tera- and petabytes of data. Processing power can be linearly increased both vertically and horizontally.
As mentioned above, SpectX runs queries directly on already stored data files. For example, logs may be collected and stored using a log management system. Since SpectX does not require data to be stored internally, its scalability model is processing-oriented. This means that storage and processing can be scaled independently which makes it technically much simpler but also cost-effective. Scaling SpectX is also much easier compared to systems using indexing which requires considerable planning and maintenance.
In practice, scalability is often constrained by pricing. The pricing models of many SIEMs are based on data volume. SpectX uses processing capacity-based pricing, i.e the more CPU's involved in query processing of queries the more you pay, regardless of the amount of data, the number of queries or users.
People familiar with SIEMs are often concerned with the amount of daily logs they have. "I have X Giga/Tera/Peta/... bytes of logs. I need to analyze ALL of them." No, you don't. However, your SIEM does need to index ALL your logs in order to allow searching and defining rules between ANY of them. Individual rules and models rarely use all the logs and the amount of data involved is considerably smaller than the total amount of indexed data. Also, there is another dimension that sets constraints to the amount of log data: the age. In SIEMs the age is usually perceived as "the latest X days" for all inputs and people tend to forget it. However, when it comes to analysing historical logs, the time (i.e the period you're investigating) becomes important. The amount of data involved is always two-dimensional: the number of different logs and their age which sets the scope of the analysis. It never involves all the different logs across all history.
SpectX is primarily designed to perform exploratory analytics, such as forensic log analysis and data discovery. The process usually consists of many sequential steps linked to previous results. To support this most effectively, SpectX provides an analytics programming language which enables composing multiple interdependent SQL queries as an executable script. This allows handling complexity when analysing real-life processes.
SQL language offers all of its analytics power, including strongly typed data types, joins and unions of different data sets, aggregations, filtering, etc. In fact, two different query dialects are supported: i) traditional SQL for people most familiar with this style and ii) command line style where individual SQL clauses can be chained together, each one passing its output to next. The latter provides more flexibility and is easier to learn.
The programming language includes a rich library of built-in functions for manipulating data: descriptive statistics, maths, logical, bit, string, cryptographic, network, geographical and many more. Note that all the functions are big-data scalable. For instance, descriptive statistics computation delivers mathematically correct results over all the distributed data. Many functions have been built specifically for security analysis, such as geolocation data, IP-address manipulations, DNS lookup, etc.
When SQL has reached its limits (for instance you want to apply advanced statistical modelling or machine learning algorithms), you can pass typed, enriched and formatted data on to any third party tool using SpectX public API.
Similarly, the RESTful API provides a simple and straightforward way for integration with enterprise applications. For instance, a Customer Service employee needing logs to solve customer problems may be provided with a simple web form with customer id and time period. The form submits a parameterised query via the API to SpectX and receives results in a matter of seconds.
Supplemental data is added during query execution. There are multiple options.
First, there are a number of built-in functions providing this information, for example, the lookup of geolocation, ASN info from an IP-address, etc.
Secondly, additional data can be added from other data sources simply by performing SQL JOIN. This makes any Threat Intelligence data integration an easy task. See examples of using a feed from US-CERT and adding Tor status to an IP-address or employee information from a relational database.
SpectX is heavily performance oriented. In fact, its extremely efficient and high-performance parsing and query engine are one of the cornerstones enabling extremely flexible data access model. Processing efficiency is high enough to allow performing big-data computations with the limited hardware resources of on-premise installations. This until now was possible only with a much larger base of commercial cloud providers.
Good performance is certainly important for delivering results in a timely manner. For an analyst performing analysis on historical logs, the speed translates to freedom of experimenting. I.e if the queries take seconds or minutes to run then you can look at data at many more angles compared to when queries run hours. In the latter case, a lot of contemplating and preparation is involved before executing the query.
Time-critical analyses, such as incident investigations highly depend on performance. However, the speed of processing queries is just one step in the path from problem to solution. Another important factor is data preparation - tasks related to acquiring, manipulating and computing additional info so that it can be subjected to analysis. SpectX minimizes the preparation phase considerably by not requiring indexing and providing extremely flexible data access.
SpectX is a software, not a service. You can download, install and run it on the hardware of your choice: your personal computer, on-premise server or in the cloud.
The installation is lightweight, the whole package is about 100 MB in size and it is fully self-contained. There are no mandatory external dependencies apart from having Java 8 installed.
All the qualities above make SpectX perfect tool for historic and forensic analysis. In fact, the latter is the use-case it was created for in first place. It is lightweight with low requirements on hardware - SpectX can be installed in a laptop and brought in to an incident scene. It uses a pulling mechanism (as opposed to push) for accessing data from various environments. There is no data preparation needed. All of this enables you to achieve results quickly in the time-critical situations of incident containment and also proceed with thorough analysis going back to the initial infection.Gartner favoured approach) then you can build one using SpectX as the engine.