Technical Overview | SpectX Log Analyzer
- Raw Data Analytics | SpectX
- Technical Overview
In order to deal with unlimited amounts of data in different locations, SpectX is built as a distributed computing system. Users interact with SpectX using all major web browsers connecting to WGUI Server
over http/https. When a user executes a query, the WGUI Server takes the query script and passes it over to the Root node.
Root is the central component of the whole system. It has a special role in orchestrating distributed query processing: compiling the query script, optimizing, splitting it to processing stages and distributed tasks, assigning and coordinating task and result execution.
Processing units (PU) are there to do the heavy lifting in query processing. A PU is a logical unit of one CPU core and a certain amount of RAM. They may reside either in one physical host or in different hosts. The number of PU's is configurable and provides linear horizontal scalability for SpectX.
As SpectX is launched, PU's connect to the Root node and start waiting for tasks. Queries are
processed by a series of small and independent tasks. Every task has some input data that must be fetched from data sources
(e.g. premise servers, cloud storages, http servers, ...), from other PU's (intermediate query results) or from the PU's local storage. Also, a task always has some output data (i.e a task result). This is stored locally in the PU and its location is passed to Root.
When the last query stage has been processed, the Root tells WGUI server the locations of query results. WGUI Server then fetches the results and serves them in a nicely formatted and responsive manner to the end user browser or via the REST API.
End user applications can easily be integrated using SpectX's REST API. All SpectX components are internally built so that they can be independently deployed. However, the Spectx Base version is limited to vertical scalability (i.e. only within one host), all components are deployed within a single Java virtual
machine, i.e. SpectX can make the maximum use of all CPU cores of the host (the actual number is configurable).
The biggest bottleneck in processing your data is the network bandwidth between SpectX and the data sources. We therefore recommend installing SpectX as close to your main data repository as possible in terms of network throughput.
SpectX Pattern Matching Engine:
SpectX Query Processing Engine:
- Built from scratch
- Implements extraction and transformation of data elements in a single-pass
- Average parsing speed is between 200 and 800 MB/sec per CPU core. (Apache logs ~ 500MB/Sec per CPU core)
- up to 200,000,000 GeoIp lookups/sec per CPU core
- a query counting successful and failed requests in 1 minute intervals from 1GB Apache common access log takes about 5-6 seconds using 4 CPU cores (Intel 2.5 GHz i7)
- a query counting requests and grouping them according to the GeoIp country codes from 1GB Apache common access log takes about 1.7 seconds using 4 CPU cores (Intel 2.5 GHz i7)
SpectX is built based on a distributed processing model making it linearly scalable both vertically and horizontally
. The scalability of processing capacity is independent from data volumes
. This is because SpectX does not need to store data for processing.
SpectX comes as a self-contained installation package with no dependencies of third party components. Built in Java, it works on all major operating systems and platforms, across all physical and virtual environments.