We developed a system that detects anomalies in web search engine data in real time. Thanks to this, the client can respond more flexibly to emerging issues.
The client (Seznam.cz) operates the second-largest web search engine in the Czech Republic, producing an enormous amount of user query data. These logs can be used not only for monitoring the technical state of the search engine but also for deeper analysis. However, the sheer volume of data overwhelmed internal analysts, leading to frequent errors in evaluations and poor detection of technical problems — ultimately affecting revenue and user experience.
We developed a system for anomaly detection and classification of user queries within the search engine. Based on log data analysis, we identified key variables and designed statistical methods that highlight deviations from normal behavior. We implemented algorithms to remove noise and seasonal effects and created detectors capable of identifying technical problems in real time. Additionally, we added a component for distinguishing search prompts, enabling more accurate ad targeting based on the true meaning of queries.
The result is a system that can detect anomalies in search engine data in real time and reduce error rates during analysis. The client now benefits from a more stable service and improved user satisfaction.