FRIDAY, NOVEMBER 22, 2024
Saturday, 13 September, 2014 11:33

Monitoring Exadata Storage Servers


Monitoring Oracle Exadata Storage Servers

by Brian Bream and Suzanne Zorn

This article describes how to use Oracle Enterprise Manager Cloud Control and command-line utilities to monitor Oracle Exadata Storage Servers.

Published September 2014


Collier IT logo
Oracle logo

Proactive monitoring of the components in Oracle Exadata Database Machine (also called Oracle Exadata) can help ensure the highest levels of system availability and performance. This article provides a high-level overview of using Oracle Enterprise Manager Cloud Control 12c and command-line utilities to monitor Oracle Exadata Storage Servers.

Want to comment on this article? Post the link on Facebook's OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitter and let's discuss.

More detailed coverage of monitoring Oracle Exadata, including hands-on exercises, is included in the Oracle University class Exadata Database Machine Administration Workshop.

Oracle Exadata Database Machine

Oracle Exadata Database Machine—an engineered system with preconfigured, pretuned, and pretested hardware and software components—is designed to be the highest performing and most available platform for running Oracle Database. Components include database servers (also called compute nodes), Oracle Exadata Storage Servers (also called storage cells), Oracle's Sun Datacenter InfiniBand Switch 36 switches, and Exadata Smart Flash Cache.

Monitoring Technologies

Oracle Exadata uses several technologies to enable the monitoring of its components. These technologies include Oracle Integrated Lights Out Manager (Oracle ILOM), Simple Network Management Protocol (SNMP), and Intelligent Platform Management Interface (IPMI).

  • Oracle ILOM. Oracle ILOM is integrated service processor hardware and software that is preinstalled on Oracle servers, including the storage and database servers in Oracle Exadata. The service processor runs its own embedded operating system and has a dedicated Ethernet port to provide out-of-band server monitoring and management capabilities. Oracle ILOM can be accessed via a browser-based web interface or a command-line interface, and it also provides an SNMP interface and IPMI support.
  • SNMP. SNMP is an open, industry-standard protocol used to monitor and manage devices on an IP network. Oracle Exadata components—including database and storage servers, switches, and power distribution units (PDUs)—use SNMP to raise alerts and report monitoring information. SNMP also enables active management of devices, such as modifying the device configuration remotely.

    Devices run SNMP agents; these agents send status and alerts to an SNMP management console (such as Cloud Control) on the network.

  • IPMI. IPMI is an open, industry-standard protocol used primarily for remote server configuration and management across a network. In Oracle Exadata, the database and storage servers contain built-in IPMI support in Oracle ILOM.

Monitoring Tools

There are two approaches for monitoring Oracle Exadata Storage Servers: using a command-line interface (CLI) or using the graphical interface provided by the Oracle Enterprise Manager Cloud Control 12c console.

  • Command-line interface. The cellcli command is used for management and monitoring of individual Oracle Exadata storage cells. In addition, the dcli (distributed CLI) utility can be used to execute scripts and commands, such as those for shutting down compute nodes, across multiple storage cells from a single interface.
  • Oracle Enterprise Manager Cloud Control 12c. This system management platform provides integrated hardware and software management (see Figure 1). Its hardware view includes a schematic of storage cells, compute nodes, and switches, as well as hardware component alerts. Its software view includes software alerts as well as information about performance, availability, and usage organized by databases, services, and clusters.

Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.

Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.

No third-party software—including third-party monitoring agents—should be installed on Oracle Exadata Storage Servers. However, Oracle Exadata can be configured to send SNMP alerts to other SNMP managers on the network.

Monitoring Architecture of Oracle Enterprise Cloud Control

Before using Oracle Enterprise Manager Cloud Control 12c with Oracle Exadata, an Oracle Management Agent and Oracle Exadata plug-in must be installed on every Oracle Exadata database server (see Figure 2). This agent monitors software targets, such as the database instances and Oracle Clusterware resources, on the database servers. The plug-in enables monitoring of other hardware components in Oracle Exadata, including the storage servers, switches, and power distribution units.

On the storage servers, the CELLSRV process provides the majority of Oracle Exadata storage services and is the primary storage software component. One of its functions is to process, collect, and store metrics. The Management Server (MS) process receives the metrics data from CELLSRV, keeps a subset of metrics in memory, and writes to an internal disk-based repository hourly. In addition, the MS process can generate alerts for important storage cell hardware or software events.

The Restart Server (RS) process is used to start up and shut down the CELLSRV and MS processes. It also monitors these services to check whether they need to be restarted.

The primary components of Oracle Enterprise Manger Cloud Control 12c are the Oracle Management Service, the Oracle Management Repository, and the Cloud Control Console. The Oracle Management Service communicates with the agents on the managed targets and stores information in the Oracle Management Repository. The Cloud Control Console provides a web-based interface for monitoring and management.

Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.

Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.

For more information on configuring Oracle Enterprise Manager Cloud Control 12c to monitor Oracle Exadata, please see the Oracle Enterprise Manager Exadata Management Getting Started Guide and the "Managing Oracle Exadata with Oracle Enterprise Manager 12c" white paper.

Note: This article focuses on using Oracle Enterprise Manager Cloud Control 12c to monitor the storage servers in Oracle Exadata. Oracle Enterprise Manager Cloud Control can also be used to monitor other Oracle Exadata hardware and software components.

Metrics, Thresholds, and Alerts

Metrics, thresholds, and alerts are key monitoring concepts. Metrics are runtime properties, such as I/O requests, throughput, or the current server temperature. Alerts are important events, such as hardware failures, software errors, or configuration issues. Thresholds are defined metric levels that, if exceeded, cause an alert to be automatically triggered.

When using Oracle Enterprise Manager Cloud Control 12c, quarantine objects are created when prescribed faults are detected, so that similar faults can be avoided in the future. This capability provides increased availability of the monitored system.

Monitoring Metrics Using the CLI

The cellcli command is run on the storage cells (not on the compute nodes) to display monitoring information. The general format of the command is:

<verb> <object> <modifier> <filter>

Where:

  • verb specifies an action (such as list or describe).
  • object specifies which object the action should be performed on (for example, a cell disk).
  • modifier (optional) specifies how the action should be modified (for example, to apply to all disks or to a specific disk).
  • filter (optional) is similar to a SQL WHERE predicate, and is used to filter the command output.

The following are some basic examples:

list physicaldisk (verb and object)

list cell detail (verb, object, and modifier)

list physicaldisk where diskType='Flashdisk' (verb, object, and filter)

By default, the user cellmonitor can execute read-only queries using the cellcli command. The user celladmin can execute cellcli commands that modify the configuration.

Metrics Terminology

Metrics are recorded measurements; for the storage cells, this includes measurements such as the number of I/O requests or the throughput.

The cellcli command refers to each metric using a composite of abbreviations, for example:

  • CD_IO_RQ_R_SM is the number of I/O requests (IO_RQ) to read (R) small blocks (SM) on a cell disk (CD).
  • GD_IO_BY_W_LG_SEC is the number of MB (IO_BY) of large block (LG) I/O writes (W) per second (SEC) on a grid disk (GD).

In addition, metrics

  • Are associated with a metricObjectName, which is the object being measured (for example, a specific cell disk)
  • Belong to an objectType group (IORM_DATABASE, CELLDISK, CELL_FILESYSTEM, and so on)
  • Have a metricType (Cumulative, Instantaneous, Rate, Transition)
  • Have a measurement unit (for example, milliseconds, microseconds, %, °F, °C)

For more details on Oracle Exadata cell metric attributes, see the Oracle Exadata Storage Server Software User's Guide.

Example Commands

The following examples illustrate basic usage of the cellcli command to display metrics information for Oracle Exadata storage cells.

  • Example 1: Display the metric definitions for a cell. This command can be used to display detailed information about the metrics that are available for a storage cell. As this example shows, one such metric is named CL_CPUT. It is of metricType Instantaneous, it is associated with objectType CELL, and it has a measurement unit of percentage.

    # CellCLI> LIST METRICDEFINITION WHERE objectType ='CELL' DETAIL
    name: CL_CPUT
    description: "Cell CPU Utilization is the percentage of time over
    the previous minute that the system CPUs were not
    idle (from /proc/stat). "
    metricType: Instantaneous objectType: CELL  unit: %
    ...
    
  • Example 2: Display the current metric values for a cell.

    # CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK'
    CD_IO_TM_W_SM_RQ CD_1_cell03    205.5 us/request
    CD_IO_TM_W_SM_RQ CD_2_cell03    93.3  us/request
    CD_IO_TM_W_SM_RQ CD_3_cell03    0.0   us/request
    ...
    
  • Example 3: Display the metric history for a cell. This command can provide insights about the trends for the values of a metric.

    # CellCLI> LIST METRICHISTORY WHERE name like 'CL_.*' -
    AND collectionTime > '2009-10-11T15:28:36-07:00'
    CL_RUNQ cell03_2 	6.0       2009-10-11T15:28:37-07:00
    CL_CPUT cell03_2 	47.6 %    2009-10-11T15:29:36-07:00
    CL_FANS cell03_2 	1         2009-10-11T15:29:36-07:00
    CL_TEMP cell03_2 	0.0 C     2009-10-11T15:29:36-07:00
    CL_RUNQ cell03_2 	5.2       2009-10-11T15:29:37-07:00
    ...
    

Monitoring Metrics Using the Oracle Enterprise Manager Cloud Control Console

Oracle Enterprise Manager Cloud Control provides an intuitive view of Oracle Exadata status, including the status of all hardware and software components. Each storage server is a separate target in Oracle Enterprise Manager Cloud Control, and the Oracle Exadata storage servers are grouped together for collective monitoring of all storage.

The Oracle Enterprise Manager Cloud Control console makes it easy to see the status at a glance, and provides an easy way to drill down to get more detailed information. Figure 3 shows a screenshot of the console.

Figure 3. Oracle Enterprise Manager Cloud Control 12c console.

Figure 3. Oracle Enterprise Manager Cloud Control 12c console.

Monitoring Alerts

Alerts for important events that occur within Oracle Exadata storage cells should be monitored and investigated to help ensure the continued uninterrupted operation of storage. Alerts are assigned a severity of warning, critical, clear, or info. Metrics can be used to signal warning alerts or critical alerts when defined threshold values are exceeded.

Similar to metrics monitoring, the Oracle Exadata CLI or Oracle Enterprise Manager Cloud Control 12c can be used to monitor alerts. The following examples illustrate using the cellcli command to monitor storage cell alerts and create thresholds.

  • Example 1: Display the definitions for all alerts that can be generated on the storage cell.

    CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description
    ADRAlert "CELL Incident Error"
    HardwareAlert "Hardware Alert"
    StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold Based Stateful Alert"
    StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC "Threshold Based ...Alert"
    StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold Based Stateful Alert"
    ...
    
  • Example 2: Display the alert history for a storage cell.

    CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' -
    AND examinedBy = '' DETAIL
    CellCLI>
    

    Note: This command produces output only if there are alerts that have not been reviewed by another administrator. No output signifies no missing (that is, not yet reviewed) alerts.

  • Example 3: Create a threshold to trigger an alert. This example uses the CT_IO_WT_LG_RQ metric, which specifies the average number of milliseconds that large I/O requests have waited to be scheduled. The alert is triggered by two consecutive measurements (occurrences=2) over the threshold values. Values of one second over the threshold trigger a warning alert; values of two seconds over the threshold trigger a critical alert.

    CellCLI> CREATE THRESHOLD ct_io_wt_lg_rq.interactive -
             warning=1000, critical=2000, comparison='>', -
             occurrences=2, observation=5
    CellCLI>
    

    Note: The CREATE THRESHOLD command creates a threshold that specifies the conditions for the generation of a metric alert. The absence of an output indicates that the threshold was created successfully.

When alerts are triggered, they automatically appear in the Oracle Enterprise Manager Cloud Control console. Administrators can select any Oracle Exadata target, view alerts on that target, and drill down to display more details about each alert. In addition, the Cloud Control console can be used to set up rules for metric alerts. See the chapter on "Using Incident Management" in the Oracle Enterprise Manager Cloud Control Administrator's Guide for more information.

Comparison: Monitoring Storage Server Availability

Both the CLI and Oracle Enterprise Manager Cloud Control 12c can be used to monitor storage server availability. To use the command-line approach, administrators must explicitly execute the following cellcli command on an Oracle Exadata storage server, and then check the status in the command output:

# CellCLI> list cell detail
...
    cellsrvStatus:      running
    msStatus:           running
    rsStatus:           running

Oracle Enterprise Manager Cloud Control 12c provides a visual overview of the availability of the storage cells, with color-coded green and red status symbols to indicate available and unavailable, respectively (see Figure 4). With Oracle Enterprise Manager Cloud Control, administrators can determine the status at a glance, and then drill down to the affected components for more information.

Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.

Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.

Comparing Metrics Across Multiple Storage Servers

Oracle Enterprise Manager Cloud Control 12c makes it easy to compare metrics across multiple storage servers. Figure 5 contains a screenshot showing a comparison of the average read response times of Oracle Exadata cell disks. The built-in graphing capability easily shows the relative performance of multiple cell disks.

Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.

Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.

The distributed CLI utility, dcli, can be used to execute commands across multiple servers on Oracle Exadata. However, it is much more complex to manually aggregate statistics reported in its command output and make comparisons across multiple storage servers.

Final Thoughts

Oracle Enterprise Manger Cloud Control 12c provides easy-to-use, intuitive monitoring of Oracle Exadata Storage Servers. Status information is visually displayed, making it easy to pinpoint problems and then drill down for more detailed information. In addition, Oracle Enterprise Manger Cloud Control provides capabilities for easily comparing metrics across multiple storage servers.

The CLI (cellcli command and dcli utility) can be useful for scripts and creating processes that need to be repeated.

See Also

The following resources are available for Oracle Exadata Database Machine and Oracle Enterprise Manager Cloud Control:

About the Authors

Brian Bream has been involved in information technology since 1981. He currently serves at the Chief Technology Officer at Collier IT. Brian also functions as an Oracle University instructor delivering courses that focus on Oracle's engineered systems, Oracle Solaris, Oracle Linux, and Oracle's virtualization and storage solutions.

Collier IT is a full-service Platinum-level Oracle partner that provides Oracle solutions, including Oracle engineered systems, software, services, and Oracle University training. Collier IT provides its customers with complete, open, and integrated solutions, from business concept to complete implementation. Since 1991, Collier IT has specialized in creating and implementing robust infrastructure solutions for organizations of all sizes. Collier IT was a go-to partner for Sun Microsystems for ten years prior to the acquisition of Sun by Oracle in 2009. As a former Sun Executive Partner and now as a Platinum-level Oracle partner, Collier IT is aligned to provide customers with complete solutions that address their business needs.

Suzanne Zorn has over twenty years of experience as a writer and editor of technical documentation for the computer industry. Previously, Suzanne worked as a consultant for Sun Microsystems' Professional Services division specializing in system configuration and planning. Suzanne has a BS in computer science and engineering from Bucknell University and an MS in computer science from Rensselaer Polytechnic Institute.

Revision 1.0, 09/08/2014

Follow us:
Blog | Facebook | Twitter | YouTube