PCA X9-2 – How To: IPMItool Access to Management and Compute Nodes

In a project actually we are looking for several methods how to monitor the Oracle Private Cloud Appliance hardware stack. There are some metrics who are gathered in the Grafana environment like memory and CPU usage, but we are not able to recognize when a compute or management node has a hardware failure. Some of the hardware events are visible in the Service Enclave User Interface, but there is still no mechanism at work to get an alert. And as there is still no EM13c plugin available for the Oracle Private Cloud Appliance X9-2 (no clue if this ever will released as it worked for X8), we need another solution to get informed.

IPMItool

The ipmitool is a powerful command-line utility used to manage and retrieve information from the Integrated Lights Out Manager (ILOM) of Oracle servers. It allows administrators to perform tasks such as monitoring hardware status, managing power settings, and accessing system event logs. By leveraging IPMI (Intelligent Platform Management Interface) commands, ipmitool can interact directly with the ILOM to gather critical server data and perform remote management operations.

And this is what we need.

Node ILOM – Enable IPMI V2.0 Sessions

To use the ipmitool from command line, we have to enable IPMI v2.0 session protocol first. The compute or management node ILOM web interfaces have PCA internal IP adresses. The get access, we must tunnel the https-protocol via management node VIP.

If you don’t enable v2.0 – you will get this error when trying to connect to an ILOM:

[root@pcamn01 ~]# ipmitool -I lanplus -H 100.96.0.64 -U root sdr elist full
Password:
Error: Unable to establish IPMI v2 / RMCP+ session

Commands to get the internal ILOM ip adress of the nodes when connected by SSH to a management node:

-- for compute nodes
# pca-admin compute node list

-- for management nodes
# pca-admin management node list

As result, you get a list of internal IP addresses of the ILOMs. This IP can be uses to create a tunnel. In this example, I open a tunnel in my workspace VDI Windows environment command shell, and try to get access from my local browser to the machines ILOM. The tunnel forwards the port 443 (which is the ILOM address of the PCA compute node 01 by the way). Login into management node with root password.

C:\Users\Homer> ssh root@vip-ip-of-the-management-node> -L 443:100.96.0.64:443
root@vip-ip-of-the-management-node>'s password:
Last login: Wed May 15 18:51:32 2024 from pcamn01
[root@pcamn01 ~]#

The open a browser and set URL to https://127.0.0.1 – the ILOM login screen is here. Login with the root password.

ILOM Administration :: Management Access :: IPMI – enable checkbos for v.2.0 Sessions and save settings.

Query the ILOM with the IPMItool

Once enabled, you can use the IPMItool from one of the management nodes to gather data from the hardware. use the parameter. Sample command to gather chassis information, the same root password is used than for ILOM login above:

[root@pcamn01 ~]#  ipmitool -I lanplus -H 100.96.0.64 -U root chassis status –v
Password:
System Power         : on
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : always-off
Last Power Event     :
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false

To get the list of events, you can leverage the power of sunoem command line tool.

[root@pcamn01 ~]# ipmitool -I lanplus -H 100.96.0.64 -U root sunoem cli  "show /SP/logs/event/list"
Password:
Connected. Use ^D to exit.
-> show /SP/logs/event/list

Event
ID     Date/Time                 Class     Type      Severity
-----  ------------------------  --------  --------  --------
289    Mon Apr 22 05:52:02 2024  IPMI      Log       minor
       ID =  116 : 04/22/2024 : 05:52:02 : System Firmware Progress : BIOS :
       System boot initiated : Asserted
288    Mon Apr 22 05:51:53 2024  IPMI      Log       minor
       ID =  115 : 04/22/2024 : 05:51:53 : System Firmware Progress : BIOS :
       Hard-disk initialization : Asserted
287    Mon Apr 22 05:51:52 2024  IPMI      Log       minor
       ID =  114 : 04/22/2024 : 05:51:52 : System Firmware Progress : BIOS :
       Keyboard controller initialization : Asserted
286    Mon Apr 22 05:51:52 2024  IPMI      Log       minor
       ID =  113 : 04/22/2024 : 05:51:52 : System Firmware Progress : BIOS :
       Video initialization : Asserted
285    Mon Apr 22 05:51:51 2024  IPMI      Log       minor
       ID =  112 : 04/22/2024 : 05:51:51 : System Firmware Progress : BIOS :
       USB resource configuration : Asserted
284    Mon Apr 22 05:51:48 2024  IPMI      Log       minor
       ID =  111 : 04/22/2024 : 05:51:48 : System Firmware Progress : BIOS :
       PCI resource configuration : Asserted
283    Mon Apr 22 05:51:42 2024  IPMI      Log       minor
Paused: press any key to continue, or 'q' to quitSession closed
Disconnected

See rack temperatures:

[root@pcamn01 ~]# ipmitool -I lanplus -H 100.96.0.64 -U root sdr type temperature
Password:
T_IN_ZONE01      | 36h | ok  |  7.0 | 43 degrees C
T_IN_ZONE1       | 37h | ok  |  7.0 | 41 degrees C
T_IN_ZONE23      | 38h | ok  |  7.0 | 44 degrees C
T_OUT_ZONE2      | 35h | ok  |  7.0 | 42 degrees C
PS0/T_IN         | D8h | ok  | 10.0 | 40 degrees C
PS0/T_OUT        | E0h | ok  | 10.0 | 44 degrees C
PS1/T_IN         | DFh | ok  | 10.1 | 43 degrees C
PS1/T_OUT        | E1h | ok  | 10.1 | 47 degrees C
T_AMB            | 39h | ok  | 23.0 | 28 degrees C

Other sunoem commands like show faulty or show /SP/faultmgmt are very helpful to gather the hardware events.

Next Level

Now we know how to gather the ILOM information of a Private Cloud Appliance X9-2 compute or management node. The final goal would be to run check jobs on a regular base on a virtual machine an push values to see them in Grafana. I am thinking about a combo of Telegraf, Influx and Grafana. And we need another user than root. But this a topic for a future blog post 🙂

Links: