SATURDAY, NOVEMBER 23, 2024

Category: Information Technology

Oracle Virtual Compute Appliance

I truly believed in the virtualization productions brought to us from Sun Microsystems.

Included in the product lines were
VirtualBox
Solaris Zones/Containers
Sun Logical Domains (LDOMs)

I was quite pleased to see that Oracle kept and provided significant enhancements to these products.
LDOMs are now OVM Server for SPARC as well as augmented these technologies with solutions like OVM for x86.

Their engineered systems work really well. Solutions include
Exadata
Exalogic
Sparc SuperCluster
Exalytics
Oracle Database Appliance

These engineered systems utilize a number of Oracle’s virtualization solutions, but I was a bit surprised that there wasn’t a true engineered system for OVM for x86. Given their acquisition of Xsigo they can bring the concept of virtualization to networking. Xsigo is now called Oracle Virtual Network (OVN) so the pieces were there, but the customer had to build the solution themselves.

Today, Oracle announced their newest Engineered System, the Oracle Virtual Compute Appliance, to address this.

The FAQ is located here.

The datasheet here.

Oracle introduced its Virtual Compute Appliance, an integrated, “wire once” stack for the data center that integrates compute, network, and storage resources in a software-defined fabric. It is designed for rapid deployment of both infrastructure hardware and application software, and runs Linux, Oracle Solaris, or Microsoft Windows.

The Oracle Virtual Compute Appliance can be scaled linearly, one server at a time, from 2 to 25 compute nodes per rack. Oracle VM Templates enable application scalability across virtualized resources. Oracle Virtual Compute Appliance controller orchestration software automatically powers up, installs, and configures the hardware and software environment. The appliance includes Oracle (Software Defined Network) SDN software for virtualizing network resources.

Infiniband: Each Oracle Virtual Compute Appliance hardware configuration contains multiple redundant QDR InfiniBand switches and Oracle Fabric Interconnect systems that serve as gateways to the data center’s ethernet network.

x86: Compute nodes comprise Oracle’s Sun Server X3-2 systems containing Intel Xeon CPUs, high-speed dual inline memory modules (DIMM) memory, redundant, 40 Gb/sec InfiniBand host channel adapters (HCAs), and redundant disks.

Storage: Oracle Virtual Compute Appliance features a fully integrated, enterprise grade Oracle ZFS Storage Appliance for centrally storing the management environment as well as providing data storage for VMs.

Some of the more interesting questions given during the webcast:

  • Q: Will this new system be recognized by Ops Center 12c and OEM 12c or will we have to wait for feature update.
    A: EM 12c will be able to be used with it just as you would use it with standard Oracle VM. That certification is being finalized now and should be available in the next few weeks.
  • Q: Are there plans to move OVM to KVM based virtualization (away from XEN)?
    A: No, no plans at all. While we continue to watch KVM closely, we continue to believe that Xen based hypervisors continue to provide the best overall mix of performance, reliability, security, and features for enterprise class customers. By the way, this is why companies like Amazon Web Services continue to use Xen as well.
  • Q: What version os Oracle VM runs on this new appliance?
    A: It runs the latest Oracle VM 3.2 release.
  • Q: Do you need to bring down the machine to add nodes?
    A: No. You can add machines without an outage to existing servers or VMs. You can also start adding VMs as soon as the very first VM comes on-line, i.e. you can be adding VMs while the system is adding servers.
  • Q: Is Oracle VM certified for Exalogic today?
    A: Yes, Oracle VM is certified for Exalogic
  • Q: What is the max number of nodes that this can expand?
    A: The OVCA base rack can support a maximum of 25 compute nodes
  • Q: Hot plugable compute nodes?
    A: The compute nodes are hot plugable.
  • Q: Can you run RAC across the appliance? Does it have to be all rAC like Exadata?
    A: You can deploy Oracle VM Templates for RAC on OVCA.
  • Q: What is the maximum number of external vlans that can be attached to the ovca ?
    A: 4096 total.
  • Q: What should I do, if I need more than 25 compute nodes, for our Weblogic applications. Do I need to install another rack?
    A: We are planning to introduce the ability to expand to additional racks but at this time the OVCA rack is limited to 25 nodes.
  • Q: Can you use Oracle Traffic Director with this appliance.
    A: No, this appliance has its own controller software.
  • Q: Does this have OVN integrated?
    A: Yes, OVN is pre integrated.
  • Q: What’s the secret sauce/Why cant i just build this from Oracle components?
    A: The secret sauce is the OVCA controller software which is a new piece of software that Orchestrates/automates the interaction between the servers, storage, and network that would otherwise have to all be done manually. Further, the manual cabling of such a system should not be under estimated. At the end of the day, you could do all of this manually, but it would take a lot of time and be significantly error prone.
  • Q: Will a single instance of OVCA manage a maximum of a single rack, or can it manage multiple racks simultaneously?
    A: At this time the controller supports only the base rack but has been designed to scale up
  • Q: Is external infiniband for example to exadata available?
    A: No. Right now, all InfiniBand connectivity is completely within an OVCA unit itself.

I’ll dig a little (OK a lot) more and post more as I gain some familiarity with the product.

Links – Week ending 8/9

If you are on 11i and are planning to upgrade to R12 then make sure you review the below links on the Consolidated Upgrade Patch 2 (CUP2). http://ow.ly/2yTvWl

Virtualization and Cloud Made Simple and Easy with Oracle’s Latest Engineered Systems – Webcast http://t.co/HFu9lzsbD8

Linux Container (LXC) Part 2: Working With Containers http://t.co/pDkVzHyYwk

e-book Engineered for Extreme Performance http://t.co/Yht6oLOQUA

Oracle Launches New Oracle Linux 6 Certifications; Oracle Linux 5 Exams To Retire http://t.co/rQNHGGrBG7

Oracle is Unveiling the Latest Engineered System for Enterprise Virtualization http://t.co/I46E2oi3dy

Ready for detailed info on Oracle Multitenant ? Read this technical white paper http://t.co/VZso6WMRdH

The Case for Running Oracle Database 12c on Oracle Solaris http://t.co/0KEMnSocix

10 Things CIOs Should Know About The World’s First Cloud Database http://t.co/sm0KrQbMkj

Oracle VM Templates for Oracle Database http://t.co/nrO4OavkMi

Basic mdb walkthrough.

The Solaris Crash Analysis Tool is a fantastic solution that is available in “My Oracle Support” (MOS) that can assist those that don’t have a strong background in Solaris internals in looking at potential issues with a system that is in a panic condition.

The built-in modular debugger (mdb) can also augment or at times work faster than SCAT

Here is a very basic walkthrough that I provide to our Collier IT engineers to assist them in initial diagnostics.

There’s much more, and I’ll add some additional walk-throughs later.

1. Useful information can be found in the stack backtrace to search keywords against MOS. Sometimes you get lucky here.

> $c
vpanic(127def0, 2a100ed40c0, 0, 0, 3effffff8000000, 1869c00)
cpu_deferred_error+0x568(ecc1ecc100000000, 2, 1000060000003a, 600000000, 0, 30001622360)
ktl0+0x48(29fff982000, 2a100ed4d78, 30000, 16, 60, 30)
pp_load_tlb+0x1e4(29fff980000, 29fff9822c0, 1d00, 29fff980300, 1822f00, 2)
ppcopy_common+0x12c(70001d32500, 700030b2500, 1, 1, 29fff982000, 29fff980000)
ppcopy+0xc(70001d32500, 700030b2500, 0, 0, 1822348, 70001d32500)
do_page_relocate+0x228(2a100ed5120, 2a100ed5128, 700030b2500, 2a100ed53e0, 0, 2a100ed4fb0)
page_relocate+0x14(2a100ed5120, 2a100ed5128, 1, 1, 2a100ed53e0, 0)
page_lookup_create+0x244(60017811400, 6007c570000, 70001d32500, 0, 2a100ed53e0, 0)
swap_getconpage+0xb4(60017811400, 6007c570000, 2000, 0, 2a100ed53c8, 2000)
anon_map_getpages+0x474(60010c02008, 0, 200, 109a420, 2a100ed53e0, 1)
segvn_fault_anonpages+0x32c(0, 800000, 0, 1, 6001753c2a8, 3)
segvn_fault+0x530(300034bc3c0, 300012abc20, 1, 1, 892000, ffffffffff76e000)
as_fault+0x4c8(300012abc20, 6001766b9d0, 890000, 60016881390, 186c0b0, 0)
pagefault+0xac(890000, 0, 1, 0, 60016881318, 1)
trap+0xd50(2a100ed5b90, 8903bb, 0, 1, fea0ad6c, 0)
utl0+0x4c(1e, fe8f8104, 9e58, fe8fee34, 7aebd8, fe8fa524)
>

2. Status can also give you things like the hostname and the kernel revision they’re running:

> ::status
debugging crash dump vmcore.0 (64-bit) from sunbkpsrv5
operating system: 5.10 Generic_142900-13 (sun4u)
panic message: UE CE Error(s)
dump content: kernel pages only
>

3. cpuinfo also shows some good info on what was running when the system panicked

> ::cpuinfo -v
 ID ADDR        FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD      PROC
  0 0000183a620  1b    7    0  60   no    no t-0    3000371fb20 java
                  |    |
       RUNNING <--+    +-->  PRI THREAD      PROC
         READY                60 2a1000c7ca0 sched
        EXISTS                59 30001e121e0 java
        ENABLE                59 30001d293e0 in.mpathd
                              59 3000371d480 java
                              59 3000371ce00 java
                              59 3000371c440 java
                              59 3000371f4a0 java

 ID ADDR        FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD      PROC
  1 0000180c000  1d    6    0  59  yes    no t-0    30001dc01c0 syslogd
                  |    |
       RUNNING <--+    +-->  PRI THREAD      PROC
      QUIESCED                99 2a100237ca0 sched
        EXISTS                60 2a100a83ca0 sched
        ENABLE                53 3000371c100 java
                              53 3000371c780 java
                              51 3000371aaa0 java
                              50 300032a9940 savecore

>

4. ::ps gives good info on everything running at the time of the crash

> ::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R      0      0      0      0      0 0x00000001 0000000001838150 sched
R      3      0      0      0      0 0x00020001 0000060012dab848 fsflush
R      2      0      0      0      0 0x00020001 0000060012dac468 pageout
R      1      0      0      0      0 0x4a004000 0000060012dad088 init
R    808      1    807    807      0 0x42000000 0000060016acf890 nbevtmgr
R    805      1      7      7  60002 0x4a304102 0000060016746038 java
R    764      1    764    764      0 0x42000000 0000060016acec70 dbsrv11
R    712      1    711    711      0 0x42000000 0000060016ad04b0 bpcd
R    709      1    708    708      0 0x42000000 00000600167fa040 vnetd
R    386      1    385    385      0 0x42000000 0000060016ad10d0 snmpd
R    382      1    382    382     25 0x52010000 00000600169a2048 sendmail
R    381      1    381    381      0 0x52010000 00000600169a2c68 sendmail
R    334      1    334    334      0 0x42000000 0000060016747878 syslogd
R    327      1    327    327      0 0x42000000 00000600161c0490 sshd
R    324      1    323    323      0 0x42010000 00000600167fb880 smcboot
R    326    324    323    323      0 0x42010000 0000060013fba018 smcboot
R    325    324    323    323      0 0x42010000 00000600167fac60 smcboot
R    275      1    275    275      0 0x42000000 0000060016748498 utmpd
R    267      1    266    266      0 0x42000000 00000600159bb860 pbx_exchange
R    263      1    263    263      0 0x42000000 00000600159bac40 inetd
R    257      1    257    257      0 0x42000000 0000060013e26c30 automountd
R    259    257    257    257      0 0x42000000 0000060015d02488 automountd
R    251      1    251    251      1 0x42000000 0000060013fbc478 rpcbind
R    234      1    234    234      0 0x42010000 00000600161c10b0 cron
R    208      1    208    208      0 0x42000000 0000060015d00c48 xntpd
R    185      1      7      7      0 0x42000000 0000060013fbd098 iscsid
R    155      1    154    154      0 0x42000000 0000060013e28470 in.mpathd
R    144      1    144    144      0 0x42000000 00000600159ba020 picld
R    139      1    139    139      1 0x42000000 00000600159bd0a0 kcfd
R    136      1    136    136      0 0x42000000 0000060012daac28 nscd
R    120      1    120    120      0 0x42000000 0000060015d030a8 syseventd
R     80      1     79     79      0 0x42020000 0000060013e26010 dhcpagent
R     61      1     61     61      0 0x42000000 0000060013fbb858 devfsadm
R      9      1      9      9      0 0x42000000 0000060013e29090 svc.configd
R      7      1      7      7      0 0x42000000 0000060012daa008 svc.startd
R    357      7      7      7      0 0x4a004000 0000060016746c58 rc2
R    702    357      7      7      0 0x4a004000 00000600167490b8 lsvcrun
R    703    702      7      7      0 0x4a004000 0000060013e27850 sh
R    809    703      7      7      0 0x4a004000 00000600169a3888 pdde
R    812    809      7      7      0 0x4a004000 0000060016ace050 pdde
R    813    812      7      7      0 0x4a004000 00000600169a44a8 sleep
R    342      7      7      7      0 0x4a004000 0000060015d00028 svc-webconsole
R    717    342      7      7      0 0x4a004000 00000600169a50c8 sjwcx
R    720    717      7      7      0 0x4a004000 00000600167fc4a0 java
R    304      7    304    304      0 0x4a004000 0000060013fbac38 ttymon
R    290      7      7      7      0 0x4a004000 00000600167fd0c0 svc-dumpadm
R    293    290      7      7      0 0x4a004000 00000600161bf870 savecore
R    269      7    269    269      0 0x4a014000 00000600161be030 sac
R    278    269    269    269      0 0x4a014000 0000060015d01868 ttymon

5. ::panicinfo shows more info on the panic itself

> ::panicinfo
             cpu                0
          thread      3000371fb20
         message UE CE Error(s)
          tstate         80001606
              g1          1270ce4
              g2          127dc00
              g3  3effffff8000000
              g4         fbfffffe
              g5                1
              g6                0
              g7      3000371fb20
              o0          127def0
              o1      2a100ed4098
              o2                0
              o3                0
              o4 fc30ffffffffffff
              o5  3cf000000000000
              o6      2a100ed3761
              o7          11020dc
              pc          104982c
             npc          1049830
               y                0
>

6. Find the address of the thread that was executing when the system panicked.

> panic_thread/K
panic_thread:
panic_thread:   3003acf7020     
gt;

7. Run the thread macro against the pointer value from above. Search for the t_procp structure.

> 3003acf7020$<$thread
    t_link = 0
    t_stk = 0x2a108333ae0
    t_startpc = 0
    t_bound_cpu = 0x30004b42000
    t_affinitycnt = 0
    t_bind_cpu = 0xffff
    t_flag = 0x1800
    t_proc_flag = 0x104
...
    t_procp = 0x3005a6713e0    <== use the value here ...
 >

8. run the proc2u macro against the pointer from the t_procp structure. Look for the value stored in p_user.u_psargs. This is the full path to the command that was running on the CPU at the time of the system panic.

> 0x3005a6713e0$<proc2u
    p_user.u_execsw = execsw+0x28
    p_user.u_auxv = [
        {
            a_type = 0x7d8
            a_un = {
                a_val = 0xffffffff7fffff90
                a_ptr = 0xffffffff7fffff90
                a_fcn = 0xffffffff7fffff90
            }
...
    p_user.u_start = {
        tv_sec = 2007 Jun 11 00:00:00
        tv_nsec = 0xcf77e0
    }
    p_user.u_ticks = 0x191b148
    p_user.u_comm = [ "bgscollect" ]
    p_user.u_psargs = [ "bgscollect -I noInstance -B /usr/adm/best1_7.3.00" ]    <== use the value here     
    p_user.u_argc = 0x5     
    p_user.u_argv = 0xffffffff7ffffc98 ... 
    >