Application Performance Monitoring and Error Logging - Technical Documentation
Generated on 9/19/2025 | AI Workflow Portal
π Executive Summary
The Xikolo 60_DEVOPS_Monitoring cluster implements a robust and integrated strategy for application performance monitoring, logging, and error management. This system leverages Sentry for comprehensive error tracking and performance profiling, Mnemosyne for distributed transaction tracing, and Telegraf for custom application metrics. The primary purpose is to provide deep observability into application health, swiftly identify performance bottlenecks, and streamline error resolution across monolithic and microservice architectures. This multi-faceted approach ensures efficient incident response and continuous operational improvement by correlating errors with traces and metrics, providing a unified view of application health and performance.
ποΈ Architecture Overview
The monitoring architecture for Xikolo cluster 60_DEVOPS_Monitoring is designed to provide comprehensive observability across its various application components and microservices. It strategically integrates Sentry for error tracking and performance monitoring, Mnemosyne for distributed tracing, and a Telegraf agent for custom metrics collection. This setup enables detailed insights into application behavior, from high-level performance trends to granular error occurrences and their root causes. The system is designed for a multi-layered approach to incident detection and analysis, fostering proactive maintenance and efficient debugging processes.
Key Components and Interactions
1. Xikolo Application: This represents the core application logic, encompassing both web and microservice components. It is the source of all performance data, errors, and custom metrics. Exceptions within the application are caught by the ErrorsController, while custom metrics are emitted via the Xikolo.metrics interface.
2. Sentry Client (with filters): The Sentry client is embedded within the application, configured to capture exceptions and performance transaction data. It integrates ActiveSupport::ParameterFilter to sanitize sensitive parameters before events are sent. Crucially, the AnnotateSentryErrorsForMnemosyne middleware also ensures that Mnemosyne trace IDs are attached to Sentry events, enabling direct correlation.
3. Mnemosyne Client (Tracing): This client is responsible for collecting distributed trace data, providing end-to-end visibility of requests across services. It captures granular details of operations and, when errors occur, can attach them. Mnemosyne is configured to send its collected trace data asynchronously to a centralized message broker.
4. Telegraf Metrics Agent: Implemented via Xikolo::Common::Metrics, this agent facilitates the collection of custom application metrics. Application code interacts with Xikolo.metrics to increment counters or set gauges, which are then packaged and sent as UDP packets to a local Telegraf agent.
5. Sentry Service (External): This is the external platform that receives, processes, and stores the filtered error events and performance transaction data sent by the Sentry clients. It provides dashboards, alerting, and analysis tools for operational teams.
6. RabbitMQ Message Broker: Serving as the AMQP sink for Mnemosyne, RabbitMQ receives and queues all distributed trace data from various Mnemosyne instances. This centralizes trace data collection, allowing for subsequent processing, storage, and analysis by a trace data analytics system. The XIKOLO_RABBITMQ_URL environment variable configures its connection.
Architecture Diagrams
Main Architecture
graph TD application["Xikolo Application"] sentryClient["Sentry Client (with filters)"] mnemosyneClient["Mnemosyne Client (Tracing)"] telegrafAgent["Telegraf Metrics Agent"] sentryService["Sentry Service (External)"] rabbitMQ["RabbitMQ Message Broker"] application -->|"Emits Errors/Perf Data"| sentryClient application -->|"Generates Trace Data"| mnemosyneClient application -->|"Emits Custom Metrics"| telegrafAgent mnemosyneClient -->|"Provides Trace ID"| sentryClient sentryClient -->|"Sends Filtered Events"| sentryService mnemosyneClient -->|"Sends Trace Data (AMQP)"| rabbitMQ
π Component Interactions
Key interactions between components in this cluster:
- Sentry: Receives captured exceptions from
ErrorsControllerand other parts of the application. [Source: RAG: docs/app/development/monitoring/index.md] - Sentry: Receives performance transaction data from the application. [Source: config/initializers/sentry.rb]
- Mnemosyne: Receives attached errors from
ErrorsControllervia::Mnemosyne.attach_error. [Source: app/controllers/errors_controller.rb] - Mnemosyne: Provides a trace ID (
current_trace&.uuid) that can be correlated with Sentry events. [Source: app/controllers/errors_controller.rb, config/initializers/sentry.rb] - Telegraf::Agent (via Xikolo::Common::Metrics): Receives metric data from application code (e.g.,
Xikolo.metrics.increment('event_name')). - Telegraf::Agent (via Xikolo::Common::Metrics): Sends metric data as UDP packets to
localhost:8094, where a Telegraf agent is expected to be listening. - AnnotateSentryErrorsForMnemosyne: Interacts with
Mnemosyne::Instrumenterto retrieve the current trace ID. - AnnotateSentryErrorsForMnemosyne: Sets context on
Sentry.get_current_scopeto include the Mnemosyne trace ID. - ErrorsController: Receives exceptions from the Rails application (
action_dispatch.exception). [Source: app/controllers/errors_controller.rb] - ErrorsController: Logs exceptions to Mnemosyne using
::Mnemosyne.attach_error. [Source: app/controllers/errors_controller.rb] - ActiveSupport::ParameterFilter: Configured in Sentryβs
before_sendcallback to sanitize event data. [Source: config/initializers/sentry.rb] - ActiveSupport::ParameterFilter: Filters parameters based on
Rails.application.config.filter_parameters. [Source: config/initializers/sentry.rb] - RabbitMQ: Receives trace data from Mnemosyne instances across all applications and microservices. [Source: config/mnemosyne.yml, RAG: services/account/config/mnemosyne.yml]
- RabbitMQ: Configured via
XIKOLO_RABBITMQ_URLenvironment variable. [Source: config/mnemosyne.yml] - Sentry Configuration (Web): Reports exceptions to the Sentry service.
- Sentry Configuration (Web): Filters sensitive parameters using
ActiveSupport::ParameterFilter. - Sentry Configuration (Microservice): Reports exceptions to the Sentry service.
- Sentry Configuration (Microservice): Filters sensitive parameters using
ActiveSupport::ParameterFilter. - Mnemosyne Configuration: Connects to RabbitMQ via
XIKOLO_RABBITMQ_URLto send trace data. - Mnemosyne Configuration: Identifies the originating application (e.g., βwebβ, βaccountβ) for traces.
- MsgrSentryIntegration: Captures
Exceptioninstances withinMsgr::Consumerdispatch logic. - MsgrSentryIntegration: Reports captured exceptions to Sentry using
Sentry.capture_exception. - Xikolo.site/Xikolo.brand: Provides contextual tags to Sentry events.
βοΈ Technical Workflows
1. Error Reporting Workflow
graph TD appProcess["Application Process"] unhandledError["Unhandled Exception Occurs"] sentryClient["Sentry Client (Filtered, Annotated)"] mnemosyneTrace["Mnemosyne (Trace ID)"] errorsController["ErrorsController (User Feedback)"] sentryPlatform["Sentry Platform (External)"] appProcess -->|"Triggers"| unhandledError unhandledError -->|"Caught by"| sentryClient unhandledError -->|"Handled by"| errorsController mnemosyneTrace -->|"Provides Trace ID to"| sentryClient sentryClient -->|"Applies Filtering & Annotation"| sentryPlatform errorsController -->|"Renders Response (HTML/JSON)"| appProcess
Unhandled exceptions arising from application processes, whether in the main web application or microservices, are primarily intercepted by the ErrorsController or directly by Sentry integrations configured in config/initializers/sentry.rb. Before any error event is dispatched to the external Sentry service, it undergoes a crucial sanitization step using ActiveSupport::ParameterFilter to remove sensitive information. Concurrently, the AnnotateSentryErrorsForMnemosyne middleware ensures that if an active distributed trace exists via Mnemosyne, its unique trace ID is appended to the Sentry error event. This enables developers to immediately link an error to its broader operational context. A predefined list of excluded_exceptions prevents known, non-critical errors from cluttering the Sentry dashboard, ensuring focus on actionable issues. For user-facing errors, the ErrorsController gracefully renders appropriate HTML or JSON responses, including Sentry event IDs and Mnemosyne trace IDs for debugging.
graph TD appProcess["Application Process"] unhandledError["Unhandled Exception Occurs"] sentryClient["Sentry Client (Filtered, Annotated)"] mnemosyneTrace["Mnemosyne (Trace ID)"] errorsController["ErrorsController (User Feedback)"] sentryPlatform["Sentry Platform (External)"] appProcess -->|"Triggers"| unhandledError unhandledError -->|"Caught by"| sentryClient unhandledError -->|"Handled by"| errorsController mnemosyneTrace -->|"Provides Trace ID to"| sentryClient sentryClient -->|"Applies Filtering & Annotation"| sentryPlatform errorsController -->|"Renders Response (HTML/JSON)"| appProcess
2. Performance Tracing Workflow
graph TD appRequest["Application Request/Job Starts"] sentryPerfClient["Sentry Client (Performance)"] mnemosyneClient["Mnemosyne Client (Trace Data)"] sentryPlatform["Sentry Platform (External)"] rabbitMQ["RabbitMQ (AMQP Sink)"] traceAnalytics["Trace Data Analytics"] appRequest -->|"Initiates Transaction"| sentryPerfClient appRequest -->|"Generates Distributed Trace"| mnemosyneClient sentryPerfClient -->|"Samples & Sends Transactions"| sentryPlatform mnemosyneClient -->|"Sends Trace Data"| rabbitMQ rabbitMQ -->|"Forwards Data to"| traceAnalytics
Performance monitoring commences as application requests are processed or background jobs execute. Sentryβs client, configured with traces_sample_rate and profiles_sample_rate (both defaulting to 1.0), dynamically samples these transactions and profiles. This sampling respects parent sampling decisions and intelligently bypasses health check endpoints (/up, /ping, /system_info) to conserve resources and focus on user-facing performance. In parallel, Mnemosyne diligently collects comprehensive distributed trace data for these operations, capturing granular details including query parameters for debugging. Once collected, Mnemosyne transmits this detailed trace information asynchronously to RabbitMQ through an AMQP sink. This message broker acts as a central collection point for traces from all application instances and microservices, facilitating a holistic view of performance bottlenecks and inter-service communication latency, which can then be processed for analytical purposes.
graph TD appRequest["Application Request/Job Starts"] sentryPerfClient["Sentry Client (Performance)"] mnemosyneClient["Mnemosyne Client (Trace Data)"] sentryPlatform["Sentry Platform (External)"] rabbitMQ["RabbitMQ (AMQP Sink)"] traceAnalytics["Trace Data Analytics"] appRequest -->|"Initiates Transaction"| sentryPerfClient appRequest -->|"Generates Distributed Trace"| mnemosyneClient sentryPerfClient -->|"Samples & Sends Transactions"| sentryPlatform mnemosyneClient -->|"Sends Trace Data"| rabbitMQ rabbitMQ -->|"Forwards Data to"| traceAnalytics
3. Custom Metrics Collection Workflow
graph TD appCode["Application Code"] xikoloMetricsInterface["Xikolo.metrics Interface"] telegrafAgentLocal["Telegraf::Agent (Local App)"] udpSocket["UDP Socket (localhost:8094)"] telegrafCollector["Telegraf Agent (Collector)"] timeSeriesDB["Time-Series Database"] appCode -->|"Emits Custom Metric"| xikoloMetricsInterface xikoloMetricsInterface -->|"Sends to"| telegrafAgentLocal telegrafAgentLocal -->|"Transmits via UDP"| udpSocket udpSocket -->|"Received by"| telegrafCollector telegrafCollector -->|"Forwards to"| timeSeriesDB
The Xikolo application provides a standardized interface, Xikolo.metrics, allowing any part of the application code to emit custom operational metrics. These metrics can represent various application-specific events, counters, or gauges, offering granular insights beyond standard error and performance tracking. When Xikolo.metrics is invoked (e.g., Xikolo.metrics.increment('event_name')), an instance of ::Telegraf::Agent is used to package and send this metric data. The data is transmitted as UDP packets to a predetermined endpoint: localhost:8094. This setup anticipates a local Telegraf agent running on the same host, which is responsible for receiving these UDP packets. This local agent then processes, aggregates, and forwards the custom metrics to a designated time-series database for long-term storage, visualization through dashboards, and integration into alerting systems, completing the metric data journey.
graph TD appCode["Application Code"] xikoloMetricsInterface["Xikolo.metrics Interface"] telegrafAgentLocal["Telegraf::Agent (Local App)"] udpSocket["UDP Socket (localhost:8094)"] telegrafCollector["Telegraf Agent (Collector)"] timeSeriesDB["Time-Series Database"] appCode -->|"Emits Custom Metric"| xikoloMetricsInterface xikoloMetricsInterface -->|"Sends to"| telegrafAgentLocal telegrafAgentLocal -->|"Transmits via UDP"| udpSocket udpSocket -->|"Received by"| telegrafCollector telegrafCollector -->|"Forwards to"| timeSeriesDB
π§ Implementation Details
The implementation of the monitoring and error management system within the Xikolo cluster involves specific technical considerations, dependencies, and configuration requirements to ensure its effectiveness and reliability.
Technical Considerations
- Sentry Configuration Details: The Sentry client is meticulously configured via
config/initializers/sentry.rb. It uses abefore_sendcallback withActiveSupport::ParameterFilterto sanitize event data based onRails.application.config.filter_parameters, ensuring sensitive information is not transmitted.app_dirs_pattern(%r{(api|app|config(?!/initializers/acfs.rb$)|lib(?!/acfs_rails_cache.rb$))}) is defined to accurately identify βin-appβ code for clearer stack traces. To reduce noise, a list ofexcluded_exceptions(e.g.,Acfs::BadGateway,Redis::CannotConnectError) is maintained. Performance tracing is enabled with configurabletraces_sample_rate(defaulting to1.0) andprofiles_sample_rate(1.0), alongside dynamic sampling rules that ignore health check endpoints. Global tags are applied usingXikolo.siteandXikolo.brandfor enhanced categorization. - Mnemosyne AMQP Sink: Mnemosyne is configured to send all collected trace data to an AMQP sink, specifically RabbitMQ. The connection endpoint is dynamically determined by the
XIKOLO_RABBITMQ_URLenvironment variable, defaulting toamqp://localhostif not explicitly set. This robust configuration ensures that distributed trace data from all application instances and microservices is centrally collected for subsequent analysis. Mnemosyne is explicitly disabled fortestandintegrationenvironments to conserve resources. - Telegraf UDP Metrics: Custom application metrics are transmitted via UDP packets to
localhost:8094. This design implies a strong operational requirement: a Telegraf agent must be co-located and actively listening on this port on every host where the application is running. This local agent then acts as a forwarder to a centralized metrics store. - Sentry-Mnemosyne Correlation: The custom
AnnotateSentryErrorsForMnemosyneRack middleware plays a critical role in bridging Sentry errors with Mnemosyne traces. It actively queriesMnemosyne::Instrumenter.current_traceto retrieve the current trace ID (UUID) and injects it into theSentry.get_current_scopecontext asmnemosyne.trace_id. This explicit correlation is vital for debugging by providing immediate context for errors. - Microservice Sentry Integration: The
MsgrSentryIntegrationmodule is prepended toMsgr::Consumerclasses in several microservices. This ensures that any unhandled exceptions occurring during RabbitMQ message processing are caught by Sentry (Sentry.capture_exception) before being re-raised to maintain the original error flow. This mechanism is crucial for complete error visibility in asynchronous message-driven architectures.
Dependencies and Integrations
- Sentry and
ActiveSupport::ParameterFilter: Sentry directly depends onActiveSupport::ParameterFilterfor the crucial sanitization of sensitive data, integrating into itsbefore_sendcallback mechanism. - Sentry and Mnemosyne: The integration between Sentry and Mnemosyne is facilitated by the custom
AnnotateSentryErrorsForMnemosynemiddleware, which ensures trace IDs are passed from Mnemosyne to Sentry contexts. - Mnemosyne and RabbitMQ: Mnemosyneβs tracing capabilities are tightly integrated with RabbitMQ, which serves as its essential AMQP sink for distributed trace data collection.
- Custom Metrics and
::Telegraf::Agent: The custom metrics collection, exposed via theXikolo.metricsinterface (fromxikolo-commongem), directly utilizes::Telegraf::Agentfor transmitting metrics data. ErrorsControllerand Monitoring Tools: TheErrorsControllerserves as a central point for application-level error handling, interacting directly with both Sentry (::Sentry.capture_exception) and Mnemosyne (::Mnemosyne.attach_error) to log and report exceptions.- Msgr::Consumer and Sentry: The
MsgrSentryIntegrationmodule explicitly integrates Sentry withMsgr::Consumerin microservices for robust error capture during message processing.
Configuration Requirements
- RabbitMQ URL (
XIKOLO_RABBITMQ_URL): This environment variable is mandatory in production environments for Mnemosyne to correctly route trace data to the central RabbitMQ instance. Its default value isamqp://localhostif not specified. - Sentry Sample Rate (
SENTRY_TRACES_SAMPLE_RATE): This environment variable controls the default sampling rate for Sentryβs performance transactions. It defaults to1.0(100%), which should be reviewed for cost optimization in high-volume environments. - Parameter Filtering Configuration:
Rails.application.config.filter_parametersmust be accurately configured to specify all sensitive data points that require filtering byActiveSupport::ParameterFilterbefore being sent to Sentry. - Mnemosyne Environment Configuration: Mnemosyneβs configuration file (
config/mnemosyne.yml) explicitly defines that the tracing system is disabled fortestandintegrationenvironments, optimizing resource usage. - Local Telegraf Agent: A
Telegraf agentmust be manually deployed and configured to run locally onlocalhost:8094on each application host to reliably receive custom metrics via UDP from theXikolo.metricsinterface. This operational dependency is critical, though its configuration specifics are not detailed in the cluster data.
π Technical Sources & References
Components
- π ErrorsController
app/controllers/errors_controller.rb - π ErrorsController
RAG: docs/app/development/monitoring/index.md
Services
- π Sentry Configuration (Microservice)
services/account/config/initializers/sentry.rb - π Sentry Configuration (Microservice)
services/course/config/initializers/sentry.rb
Configuration
- π Sentry
config/initializers/sentry.rb - π Mnemosyne
app/controllers/errors_controller.rb - π Mnemosyne
config/mnemosyne.yml - π Telegraf::Agent (via Xikolo::Common::Metrics)
gems/xikolo-common/lib/xikolo/common/metrics.rb - π Telegraf::Agent (via Xikolo::Common::Metrics)
Gemfile - π AnnotateSentryErrorsForMnemosyne
config/initializers/sentry.rb - π ActiveSupport::ParameterFilter
config/initializers/sentry.rb - π RabbitMQ
config/mnemosyne.yml - π Sentry Configuration (Web)
config/initializers/sentry.rb - π Sentry Configuration (Web)
RAG: docs/app/development/monitoring/index.md - π Mnemosyne Configuration
config/mnemosyne.yml - π Mnemosyne Configuration
services/account/config/mnemosyne.yml - π MsgrSentryIntegration
services/course/config/initializers/sentry_msgr.rb - π MsgrSentryIntegration
services/news/config/initializers/sentry_msgr.rb - π Xikolo.site/Xikolo.brand
config/initializers/sentry.rb - π Configuration
config/application.rb, Gemfile, config/database.yml - π Process Management
Procfile, Procfile.web - π Build & Deploy
Rakefile, package.json
Documentation
- π Sentry
RAG: docs/app/development/monitoring/index.md - π AnnotateSentryErrorsForMnemosyne
RAG: docs/app/development/monitoring/index.md - π RabbitMQ
RAG: foundational_context
This documentation is automatically generated from cluster analysis and should be validated against the actual codebase.