Assignment: A system capable of detecting anomalies in data produced by web search engine.
Web search engine produces and stores large amount of metadata providing detailed information about processed search requests. This information could be used to estimate the current state and load of the search engine itself and other components involved.
Anomalies in the trends of these data can indicate problems such as broken components, failures of external services, or failure of the logging itself.
After analyzing the data logs, we have identified important variables and designed statistical aggregations that highlight potential anomalies.
We have implemented algorithms to remove seasonality and noise from the aggregated data and developed a set of detectors to find anomalies in the processed data series.
Finally, we have tuned the detectors’ parameters to optimize the accuracy of the detection.
We’ve developed the anomaly detector that will allow to detect anomalies in real time to alert the operators of possible problems.
Seznam.cz is one of the top czech IT and media companies, and runs the second most used web search engine in Czech Republic.