123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2017-03-30 Thu

03:29 Sponsored: 64% off Code Black Drone with HD Camera [del.icio.us] (151 Bytes) » 车东[Blog^2]
Our #1 Best-Selling Drone--Meet the Dark Night of the Sky!

2017-03-29 Wed

04:25 How to Patch an Exadata (Part 6) – Timing (9882 Bytes) » Official Pythian Blog

Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
 

6: Timing

Now that we know how to patch every component and the different options available to do so (rolling, non-rolling), which one is the best? How much time does it take?

The answer is obviously “it depends” but I will try to bring few insights so you can have a bright answer when your manager inevitably asks you “How long will that patch be? I need to negotiate the window maintenance with the business… they aren’t happy…” ;)
 
 
Here is a summary of the length of the patch application in a Rolling fashion and in a Non-Rolling fashion (as well as the downtime for each method). Please note that I put in green what I recommend.
 

Cells

  • Rolling : 1h30 x number of cells
  • Rolling downtime : 0 minute
  • Non-rolling : 2h (1h30 to patch a cell + 30 minutes to stop and start everything before and after the patch)
  • Non-rolling downtime : 2h

Note : Refer to my notes at the end of this page about this choice
 
IB Switches

  • Rolling : 45 minutes per switch then 1h30 total
  • Rolling downtime : 0 minute
  • Non-rolling : not available
  • Non-rolling downtime : not available

Note: There’s no non-rolling method for the IB Switches then here the choice is an easy one!
 
Database Servers

Note: Refer to my notes at the end of this page about this choice
 
Grid

Note: No green color here? To patch the grid, I recommend to go for a mix like:

  • Rebalance the services away from node 1
  • Patch the node 1
  • Verify that everything is well restarted on the node 1
  • Move all the services to the node 1 (if it is possible that only one node can handle the whole activity – but usually we patch during a quiet period)
  • Apply the patch in a non-rolling method (for the Grid it means launching the patch manually in parallel on the remaining nodes)
  • Once the grid has been patched on all the nodes, restart all the services as they were before the patch

 
Databases Oracle homes

  • Rolling: 20 – 30 minutes per node + ~ 20 minutes per database for the post installation steps
  • Rolling downtime:

    – Can be 0 minute if you rebalance the services before patching a node (as described here for the Grid patching, you can apply the same concept for the database patching as well) + ~ 20 minutes per database for the post installation steps.

    Please note that if you have 30 databases sharing the same ORACLE_HOME, you won’t be able to easily apply 30 post-install steps at the same time then the 30th database will suffer a bigger outage than the 1st one you restart on the patched ORACLE_HOME. This is why I strongly recommend the use of this quicker method.

    – An ~ 20 minutes downtime per database you can chose when using the quicker way !

  • Non-rolling: 20 – 30 minutes
  • Non-rolling downtime: 20 – 30 minutes for all the databases running on the patched Oracle home + ~ 20 minutes per database for the post installation steps. Note that if you have 30 databases sharing the same ORACLE_HOME, you won’t be able to apply 30 post-install steps at the same time then the 30th database will suffer a bigger outage than the 1st one you restart on the patched ORACLE_HOME.

Note: In this instance, I will definitely go for the quicker way ! : clone the Oracle home you want to patch to another one, apply the patch and move the databases one by one to the new patched Oracle home
 
 

Notes on my recommendations

Yes, I always prefer the rolling method for the Infrastructure components (Grid and Database Servers). This is because I can mitigate the outage and I’m also sure to avoid any outage created by the patch or anything preventing for example a reboot as we do not reboot those servers frequently.

Imagine if you go for a cell rolling upgrade and one cell does not reboot after the patch. You’ll have no issue here as the patch will stop automatically; everything will work as before with one cell down, no one will notice anything, you are still supported as it is supported to run different version across different servers. You can then quietly check the troubleshooting section of this blog or go to the pool while Oracle finds a solution for you.

It happened to us on production (it didn’t happen on the DEV on QA Exadatas before…), we warned the client and it took few days to Oracle to provide an action plan. All ran perfectly during a week with a cell down, we then applied the Oracle action plan during the next week-end and could properly finish the patch. The result here is that we applied the patch successfully. We had an issue that caused no outage nor performance degradation and we still fit in the maintenance window – very good job from a client and process point of view !

But if you go for a non-rolling cell patching and all your cells (or few of them) do not reboot after the patch, then you are in trouble and you will lose ten times the time you think you could have won by doing a non-rolling manner. You will most likely have a failed patch outside of the maintenance window, a Root Cause Analysis to provide to the process guys and you probably won’t patch this Exadata any more for a while as the client will be… hmmm… a bit chilly about that question in the future.

And this risk is the same for the databases servers.
 
I do not say that the Bundle won’t work and create a big outage (I did a lot and it works pretty well), it is just all about risk mitigation. And remember: “highest level of patch = highest level of bug” :)
 
 
 
If you’ve reached this point, I hope that you enjoyed this Odyssey into the Exadata patching world as much as I enjoy working with it on a daily basis!
 
 


Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
02:20 How to Patch an Exadata (Part 5) – Troubleshooting (8299 Bytes) » Official Pythian Blog

Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
 

5: Troubleshooting

In this post I’ll be sharing a few issues that we faced and sorted out. From a ‘lessons learned’ perspective, they are worth sharing in order to help others. Please note that they’ve all been applied in real life on X4 and/or X5 Exadatas.

 
 
 

5.1 – Cell patching issue

  • It happened when the patch failed on a cell:
myclustercel05 2016-05-31 03:46:42 -0500 Patch failed during wait for patch finalization and reboot.
2016-05-31 03:46:43 -0500 4 Done myclustercel05 :FAILED: Details in files .log /patches/April_bundle_patch/22738457/Infrastructure/12.1.2.3.1/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.1.160411/patchmgr.stdout, /patches/April_bundle_patch/22738457/Infrastructure/12.1.2.3.1/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.1.160411/patchmgr.stderr
2016-05-31 03:46:43 -0500 4 Done myclustercel05 :FAILED: Wait for cell to reboot and come online.
  • Checking this logfile on the cell, we can see that it failed due to a reduced redundancy:
/opt/oracle/cell12.1.2.1.2_LINUX.X64_150617.1/.install_log.txt
CELL-02862: Deactivation of grid disks failed due to reduced redundancy of the following grid disks: DATA_CD_00_myclustercel05, DATA_CD_01_myclustercel05, DATA_CD_02_myclustercel05, DATA_CD_03_myclustercel05, DATA_CD_04_myclustercel05, DATA_CD_05_myclustercel05, DATA_CD_06_myclustercel05, DATA_CD_07_myclustercel05, DATA_CD_08_myclustercel05, DATA_CD_09_myclustercel05, DATA_CD_10_myclustercel05, DATA_CD_11_myclustercel05....
  • It was due to the fact that the previous cell disks were not brought online after the reboot. In this case, we have to bring the disks online manually on the previous cell and resume the patch on the remaining cells
    • Bring disks online manually on the failed cell:
# ssh root@myclustercel05
# cellcli -e alter griddisk all active
# cellcli -e list griddisk attributes name, asmmodestatus # to check the status of the disks

… wait until all disks are “ONLINE” …

    • Restart the patch on the remaining cells (cel06 and cel07)
# cd 
# cat ~/cell_group | grep [67] > cells_6_and_7
# ./patchmgr -cells cells_6_and_7 -cleanup
# ./patchmgr -cells cells_6_and_7 -patch_check_prereq -rolling
# ./patchmgr -cells cells_6_and_7 -patch -rolling
# ./patchmgr -cells ~/cell_group -cleanup

 
 
 

5.2 – CRS does not restart Issue

It happened that after a failed Grid patch, CRS was unable to restart. We opened a SR and Oracle came with an action plan to restart the GI. Let’s say the issue happened on server myclusterdb03 here:

  • Stop the clusterware
[root@myclusterdb03]# crsctl stop crs -f

 

  • Remove the network sockets
[root@myclusterdb03]# cd /var/tmp/.oracle
[root@myclusterdb03]# rm -f *

 

  • Remove the maps files
[root@myclusterdb03]# cd /etc/oracle/maps/
[root@myclusterdb03]# mv myclusterdb03_gipcd1318_cc0d4e3b8eedcf02bf179a98a71ce468-0000000000 X-myclusterdb03_gipcd1318_cc0d4e3b8eedcf02bf179a98a71ce468-0000000000

 

  • Start the clusterware
[root@myclusterdb03]# crsctl start crs

The Clusterware, upon starting, will recreate network sockets and maps file.

 
 
 

5.3 – A Procedure to Add Instances to A Database

The following is a procedure that I performed after a CRS patch failed on a node 3. In this case, some databases were only running on nodes 3 and 4. As we had an issue on node 3 CRS patching, we opted to move these databases to nodes 1 and 2 before the end of the maintenance window so we could then work on the failed node 3 quietly with no downtime. The patch on node 4 was next and was also completed with no downtime.

The goal was to add two instances on nodes 1 and 2 to the database mydb:

select tablespace_name, file_name from dba_data_files where tablespace_name like 'UNDO%' ;
create undo tablespace UNDOTBS1 datafile '+DATA' ;
create undo tablespace UNDOTBS2 datafile '+DATA' ;
alter system set undo_tablespace='UNDOTBS1' sid='mydb1' ;
alter system set undo_tablespace='UNDOTBS2' sid='mydb2' ;

show spparameter instance
alter system set instance_number=3 sid='mydb1' scope=spfile ;
alter system set instance_number=4 sid='mydb2' scope=spfile ;
alter system set instance_name='mydb1' sid='mydb1' scope=spfile ;
alter system set instance_name='mydb2' sid='mydb2' scope=spfile ;

show spparameter thread ;
alter system set thread=1 sid='mydb1' scope=spfile ;
alter system set thread=2 sid='mydb2' scope=spfile ;

set lines 200
set pages 999
select * from gv$log ;
alter database add logfile thread 1 group 11 ('+DATA', '+RECO') size 100M, group 12 ('+DATA', '+RECO') size 100M, group 13 ('+DATA', '+RECO') size 100M, group 14 ('+DATA', '+RECO') size 100M ;
alter database add logfile thread 2 group 21 ('+DATA', '+RECO') size 100M, group 22 ('+DATA', '+RECO') size 100M, group 23 ('+DATA', '+RECO') size 100M, group 24 ('+DATA', '+RECO') size 100M ;
select * from gv$log ;

alter database enable public thread 1 ;
alter database enable public thread 2 ;

srvctl add instance -db mydb -i mydb1 -n myclusterdb01
srvctl add instance -db mydb -i mydb2 -n myclusterdb02
srvctl status database -d mydb

sqlplus / as sysdba
select host_name, status from gv$instance ;

srvctl modify service -d mydb -s myservice -modifyconfig -preferred 'mydb1,mydb2,mydb3,mydb4'
srvctl modify service -d mydb -s myservice -modifyconfig -preferred 'mydb1,mydb2,mydb3,mydb4'
srvctl start service -d mydb -s myservice -i mydb1
srvctl start service -d mydb -s myservice -i mydb2
srvctl start service -d mydb -s myservice -i mydb1
srvctl start service -d mydb -s myservice -i mydb2

 
 
 

5.4 – OPatch Resume

As general advice, if an opatch/ opatchauto operation fails, try to resume it:

[root@myclusterdb03]# cd /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103
[root@myclusterdb03 24448103]# /u01/app/12.1.0.2/grid/OPatch/opatchauto resume -oh /u01/app/12.1.0.2/grid

 
 


Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
02:19 How to Patch an Exadata (Part 4) – The Rollback Procedure (6937 Bytes) » Official Pythian Blog

Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
 

4: The Rollback Procedure

You will most likely never need to rollback any patch on an Exadata but it is interesting to know that it is possible and where to find the information in case of it happens one day; it is indeed possible to rollback any part of the patch. I list the procedure here but note that I have never tested it.
 
 
Note that for the cells, it is only possible to rollback a successful update. However, if you face a failed cell update, read my first advice again, check the troubleshooting section of this blog, the known issues of the patch readme and if you found nothing there, open a Sev 1 SR to Oracle Support.

I will not talk about the GI and Database OH rollback procedure here as there’s nothing specific here, it is the same as with non-Exadata GI and DB OH.

 
 

4.1 – Cell Rollback

We can read in the readme that it is only possible to rollback a successfully-updated Exadata Cells. Cells with incomplete or failed updates cannot be rolled back.

If you really want to rollback a successfully patched cell, here is the procedure you will find in the Readme section “2.3 Rolling Back Successfully Updated Exadata Cells” :

  • Check the version cells will be rolled back to and the flashCacheMode setting with the following commands :
[root@myclusterdb01 ~]# dcli -l root -g cell_group imageinfo -ver -inactive
[root@myclusterdb01 ~]# dcli -l root -g cell_group cellcli -e 'list cell attributes flashCacheMode'

 
Cells being rolled back to releases earlier than release 11.2.3.2.0 with write back flash cache enabled need to be converted manually back to write through flash cache before being rolled back. Disable write back flash cache using the script in My Oracle Support note 1500257.1.
Cells being rolled back to release 11.2.3.2.0 or later retain the flash cache mode that is currently set.
 

  • Check the prerequisites using the following command:
[root@myclusterdb01 ~]# ./patchmgr -cells cell_group -rollback_check_prereq -rolling
[root@myclusterdb01 ~]# ./patchmgr -cells cell_group -rollback -rolling
  • Clean up the cells using the -cleanup option to clean up all the temporary update or rollback files on the cells. This option cleans the stale update and rollback states as well as cleaning up to 1.5 GB of disk space on the cells. Use this option before retrying a halted or failed run of the patchmgr utility :
[root@myclusterdb01 ~]# ./patchmgr -cells cell_group -cleanup

 
 

4.2 – DB nodes Rollback

Here is the procedure that can be found in the readme to rollback a database node.

  • Check the version of each DB node before the rollback :
[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root imageinfo -ver
  • Umount the NFS :

You can generate all the umount commands with this command:

df -t nfs | awk '{if ($NF ~ /^\//){print "umount " $NF}}'
  • Rollback the patch (launch it from the cel01 server) :
[root@myclustercel01 ~]# cd /tmp/dbserver_patch_5.161110
[root@myclustercel01 dbserver_patch_5.161110]# ./patchmgr -dbnodes ~/dbs_group -rollback -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version <previous the='' of='' version='' installed='' patch=''> -rolling
  • Check the version of each DB node after the rollback:
[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root imageinfo -ver

 
 

4.3 – IB Switches Rollback

As per the documentation, it is only possible to rollback an IB Switch to version 2.1.6-2
Be sure to be connected to the server myclusterdb01 (a server where the SSH keys are deployed to the IB switches)

  • Check the current version installed on the IB switches :
[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root version | grep -i version

Downgrade the version to the only possible version : 2.1.6-2

[root@myclusterdb01 ~]# cd /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.3.161013/
[root@myclusterdb01 ~]# ./patchmgr -ibswitches ~/ib_group -downgrade -ibswitch_precheck
[root@myclusterdb01 ~]# ./patchmgr -ibswitches ~/ib_group -downgrade
  • Check the versions on the IB switches after the rollback:
[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root version | grep -i version

 
 


Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
02:18 How to Patch an Exadata (Part 3) – Grid and Database OH Patching (20350 Bytes) » Official Pythian Blog

Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

h2>

 


 

3.4/ Patching the Grid Infrastructure

3.4.0 – Information

3.4.1 – Check lsinventory

It is a good idea to check and save a status of the current GI homes before applying the patch. Check the checksum of each home at the and of the opatch lsinventory report (it should be the same).

[oracle@myclusterdb01]$ . oraenv <<< +ASM1
[oracle@myclusterdb01]$ $ORACLE_HOME/OPatch/lsinventory -all_nodes

 
 

3.4.2 - Patch GI on a Node

 

[root@myclusterdb01 ~]# cd /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103
[root@myclusterdb01 24448103]# /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid

Opatch will most likely finish with some warnings:

[Jun 5, 2016 5:50:47 PM] --------------------------------------------------------------------------------
[Jun 5, 2016 5:50:47 PM] The following warnings have occurred during OPatch execution:
[Jun 5, 2016 5:50:47 PM] 1) OUI-67303:
 Patches [ 20831113 20299018 19872484 ] will be rolled back.
[Jun 5, 2016 5:50:47 PM] --------------------------------------------------------------------------------
[Jun 5, 2016 5:50:47 PM] OUI-67008:OPatch Session completed with warnings.

Checking the logfiles, you will find that this is probably due to superset patches:

Patch : 23006522 Bug Superset of 20831113

If you check the patch number, you will find that this is an old patch : Patch 20831113: OCW PATCH SET UPDATE 12.1.0.2.4

Then this is safely ignorable as opatch rollback old patches after having applied the new ones.
 
 

3.4.3 - Check lsinventory

Let's verify that the patch has been correctly installed on each node (check the checksum of each home at the and of the opatch lsinventory report):

[oracle@myclusterdb01]$ . oraenv <<< +ASM1
[oracle@myclusterdb01]$ $ORACLE_HOME/OPatch/lsinventory -all_nodes

 
 
 

3.4.4 - How to Mitigate the Downtime

When you patch the Grid Infrastructure, all the databases running on the node you will patch the Grid on will be stopped for 30 - 45 minutes which is quite a big outage.
 
A way to greatly mitigate this outage (knowing that they're most likely RAC databases running on Exadata), is to use the power of the Oracle services.
 

  • With load-balances services :
  • Let's say you have a database running 4 instances on 4 nodes of the Exadata with a load-balanced APP service across the 4 nodes and you're about to patch the node1. Just stop the APP service on the node you will patch (no new connection will come on this node), wait for the current connections to finish and you are done. You can patch node 1 with no outage for the applications / users!
     

  • With non load-balances services :
  • You have non load balanced service? Not a problem. Just move this service away from the node you want to patch, wait for the current connections to finish and you can achieve the same goal.
     

  • You don't use services?
  • This is then the opportunity you were waiting for to deploy the Oracle services! If you can't (or don't want to), you can always find a workaround in modifying the tnsnames.ora file of the application server. This will remove the node you want to patch so no new connection can go to this node any more. You can then wait for the current connections to finish and you can patch a node with no downtime.

 
 
 

3.5 Patching the Databases Oracle Homes

3.5.0 - Information

  • As we have upgraded opatch for the GI and the database OH in the pre-requisites section, we do not need to create and specify an ocm.rsp file
  • Double check that all the prechecks described in the database and OJVM prechecks section are done and returned no error
  • This patch has to be launched for every database OH you want to patch
  • There is no -rolling option for the database OH patch, you have to do it manually (for example, if you want to patch OH1 on the node1, move the services of the databases that are running on the node 1 to another node, patch the node 1 and continue with the others following the same procedure)
  • The OJVM patch requires a total downtime;everything you want to patch that runs on the OH has to be stopped (hopefully, this one is quick)
  • In the patch,verify README the numbers of the patches to apply, these are the October 2016's
  • You can use screen instead of nohup if it is installed on your system
  • I will not describe the steps to apply the patch to a 11g database, they are well documented everywhere, they are basically almost the same as the 12c one except that datapatch is not used but catbundle.sql exa apply, please refer to the README for the exact 11g procedure

 
 

3.5.1 opatch lsinventory

Before starting to apply the patch, it is very important to have a clear understanding of the current status to then perform and store the output of the below command:

$ORACLE_HOME/OPatch/opatch lsinventory -all_nodes

Have a look at the checksum report at the end of the report, it should be the same on each node:

Binary & Checksum Information
==============================

 Binary Location : /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

 Node                   Size                    Checksum
 ----                   ----                    --------
 myclusterdb01         327642940               BD0547018B032A7D2FCB8209CC4F1E6C8B63E0FBFD8963AE18D50CDA7455602D
 myclusterdb02         327642940               BD0547018B032A7D2FCB8209CC4F1E6C8B63E0FBFD8963AE18D50CDA7455602D
 myclusterdb03         327642940               BD0547018B032A7D2FCB8209CC4F1E6C8B63E0FBFD8963AE18D50CDA7455602D
 myclusterdb04         327642940               BD0547018B032A7D2FCB8209CC4F1E6C8B63E0FBFD8963AE18D50CDA7455602D
--------------------------------------------------------------------------------

OPatch succeeded.

 
 
 

3.5.2 - Apply the Patch

The Bundle contains two different patches for the databases ORACLE_HOMES  : one to patch the database OH and one specific to the OJVM

3.5.2.1 - Apply the Database OH Patch

  • This patch has to be applied as root
  • You have to manually launch this patch on each node
  • Do NOT stop anything here, opatchauto will take care of it
  • This patch takes around 20 - 30 minutes to complete on each node
[root@myclusterdb01]# cd /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch
[root@myclusterdb01]# nohup ./opatchauto apply /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24340679 -oh /u01/app/oracle/product/12.1.0.2/dbhome_1 &
[root@myclusterdb01]# nohup ./opatchauto apply /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24846605 -oh /u01/app/oracle/product/12.1.0.2/dbhome_1 &

 
 

3.5.2.2 - Apply the OJVM Patch

  • The OJVM has to be applied as the oracle user
  • This patch will be automatically applied to all the nodes where the ORACLE_HOME is installed
  • This patch takes only few minutes to apply

 
- Stop everything that is running on the OH you want to patch on every node (it would be better to do a FOR loop to stop everything but it is more convenient like this to copy and paste)

[oracle@myclusterdb01]$ . oraenv <<< A_DATABASE_WITH_THE_CORRECT_ORACLE_HOME
[oracle@myclusterdb01]$ srvctl stop home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n1 -n myclusterdb01
[oracle@myclusterdb01]$ srvctl stop home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n2 -n myclusterdb02
[oracle@myclusterdb01]$ srvctl stop home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n3 -n myclusterdb03
[oracle@myclusterdb01]$ srvctl stop home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n4 -n myclusterdb04

 
 
- Apply the patch from myclusterdb01 (any node can be used though)

[oracle@myclusterdb01]$ cd /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018OJVMPSU/24315824
[oracle@myclusterdb01]$ $ORACLE_HOME/OPatch/opatch apply

 
 

3.5.3 - opatch lsinventory

To be sure that the patch has successfully been applied on all the nodes, perform and store the output of the below command:

$ORACLE_HOME/OPatch/opatch lsinventory -all_nodes

 
Have a look at the checksum report at the end of the report, it should be the same on each node (an example of this output is shown in paragraph 3.5.1)

 
 
 

3.5.4 - Post-Install

Some post-install steps have to be performed for both OJVM and database OH patch, this has to be done for each database (on one node only).

    . oraenv <<< A_DATABASE
    sqlplus / as sysdba
    startup nomount           -- All DB should be down here, only start on one node, don't use srvctl here
    alter system set cluster_database=false scope=spfile;
    shut immediate
    startup upgrade
    cd /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch
    ./datapatch -verbose

    -- It happened that datapatch had issues with some patches then we have to :
    ./datapatch -apply 22674709 -force -bundle_series PSU  -verbose
    ./datapatch -apply 22806133 -force -bundle_series PSU  -verbose

Note: the datapatch -force is recommended by Oracle support when ./datapatch -verbose fails (...). you can ignore the errors of the -force datapatch

    sqlplus / as sysdba
    alter system set cluster_database=true scope=spfile;
    shut immediate
    srvctl start database -d XXX

    - Verify that the patches are correctly installed
    set lines 200
    set pages 999
    SELECT patch_id, patch_uid, version, flags, action, action_time, description, status, bundle_id, bundle_series, logfile FROM dba_registry_sqlpatch ;

 
 

3.5.5 - Post Post-Install

Use the start home statement in case of something is missing using the statefiles we created in the paragraph 3.5.2.2

srvctl start home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n1 -n myclusterdb01
srvctl start home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n2 -n myclusterdb02
srvctl start home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n3 -n myclusterdb03
srvctl start home -o /u01/app/oracle/product/12.1.0.2/dbhome_1 -s /tmp/12c.statefile_n4 -n myclusterdb04

 
 
 

3.5.6 - A Quicker Way

There's a quicker way to patch the database ORACLE_HOMEs than the one described above. This quicker way is also more flexible to manage and reduces the outage needed to apply the post-install steps.
 
With the above procedure, once the ORACLE_HOME is patched, you have no other choice than to apply the post-install steps right after as you have no another way to restart the database. When you have many database sharing the same ORACLE_HOME, this can be quite long and/or painful. Note that if you have 30 databases sharing the same ORACLE_HOME, you won't be able to apply 30 post-install steps at the same time.The thirtieth database will then suffer a bigger outage than the first one you restart on the patched ORACLE_HOME. This is totally mitigated by the way of doing I describe below.
 
This quicker way is to clone the current OH (let's say /u01/app/oracle/product/12.1.0.2/dbhome_1) to another one (let's say /u01/app/oracle/product/12.1.0.2/dbhome_2), apply the patch against the future OH (dbhome_2) and then move database per database (when you want) from the old OH (dbhome_1) to the new patched OH (dbhome_2). Here is plan:
 

  • Copy dbhome_1 to dbhome_2 on each node
cp -r /u01/app/oracle/product/12.1.0.2/dbhome_1 /u01/app/oracle/product/12.1.0.2/dbhome_2
  • Clone it on each node

- On node 1

[oracle@myclusterdb01]$ cd /u01/app/oracle/product/12.1.0.2/dbhome_2/oui/bin/
[oracle@myclusterdb01 bin]$ ./runInstaller -clone -waitForCompletion ORACLE_HOME="/u01/app/oracle/product/12.1.0.2/dbhome_2" ORACLE_HOME_NAME="OraDB12Home2" "ORACLE_BASE=/u01/app/oracle" "CLUSTER_NODES={myclusterdb01,myclusterdb03,myclusterdb04}" "LOCAL_NODE=myclusterdb01" -silent -noConfig -nowait

 

- On node 2

[oracle@myclusterdb01]$ cd /u01/app/oracle/product/12.1.0.2/dbhome_2/oui/bin/
[oracle@myclusterdb02 bin]$ ./runInstaller -clone -waitForCompletion ORACLE_HOME="/u01/app/oracle/product/12.1.0.2/dbhome_2" ORACLE_HOME_NAME="OraDB12Home2" "ORACLE_BASE=/u01/app/oracle" "CLUSTER_NODES={myclusterdb01,myclusterdb03,myclusterdb04}" "LOCAL_NODE=myclusterdb02" -silent -noConfig -nowait

 

etc.. on all the other nodes

Note here that what changes in the command line to clone the OH are the CLUSTER_NODES and LOCAL_NODE parameters.

 

  • Stop the database you want, restart it with the new OH and apply the post-install steps following paragraph 3.5.4
  •     -- On each node, copy the init, the passwordfile to the new home, any configuration file you would also use
        [oracle@myclusterdb01]$ cp /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/init${ORACLE_SID}.ora /u01/app/oracle/product/12.1.0.2/dbhome_2/dbs/.
        [oracle@myclusterdb01]$ cp /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapw${ORACLE_SID}.ora /u01/app/oracle/product/12.1.0.2/dbhome_2/dbs/.
    
        -- On each node, check and update your LDAP configuration if you have one
    
        -- On each node, update /etc/oratab and/or the script you use to switch between the database environments
        #MYDB:u01/app/oracle/product/12.1.0.2/dbhome_1:N
        MYDB:u01/app/oracle/product/12.1.0.2/dbhome_2:N
    
        -- One one node, modify the ORACLE_HOME in the cluster configuration
        [oracle@myclusterdb01]$ srvctl modify database –d  -o /u01/app/oracle/product/12.1.0.2/dbhome_2                          -- 11g
        [oracle@myclusterdb01]$ srvctl modify database –db  -oraclehome /u01/app/oracle/product/12.1.0.2/dbhome_2                -- 12c
    
        -- One one node, modify the spfile configuration in the cluster configuration (if your spfile is not stored under ASM)
        [oracle@myclusterdb01]$ srvctl modify database –d  -p /path_to_your_shared_spfile/spfile${ORACLE_SID}.ora                -- 11g
        [oracle@myclusterdb01]$ srvctl modify database –db  -spfile /path_to_your_shared_spfile/spfile${ORACLE_SID}.ora          -- 12c
    
         -- Bounce the database
        [oracle@myclusterdb01]$ srvctl stop database -d  -o 'immediate'                                                          -- 11g
        [oracle@myclusterdb01]$ srvctl start database -d 
    
        [oracle@myclusterdb01]$ srvctl stop database -db  -stopoption 'immediate'                                                -- 12c
        [oracle@myclusterdb01]$ srvctl start database -db 
    
        -- If you use OEM, you will have to manually update the new OH in the target configuration
    

     
    Note that here only the last step requires a downtime of 15 - 20 minutes (the time to bounce the database and run the post install steps), all the previous steps can be done earlier during a regular weekday. Another point to add is that you can chose which database to patch when you want (which makes this way of working very flexible).
     
     

    If you reached that point, it means that you are done with your Exadata patching!
     


    Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

     
01:34 How to Patch an Exadata (Part 2) – Cells, IB and DB Servers (16844 Bytes) » Official Pythian Blog

Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

3: The patching procedure

3.1 Patching the Cells (aka Storage Servers)

 
 

3.1.0 – Information

  • All actions must be done as root
  • Patching a cell takes around one hour and thirty minutes (it may take longer in the event of heavy I/O activity, we experienced some 3 hours per cell patching sessions on an heavy I/O loaded Exadata)
  • You can connect to a cell console to check what is happening during a patch application. Please find the procedure on how to connect to an ILOM console. Once connected, you will see everything that is happening on the server console like the reboot sequence, etc…:
[root@myclusterdb01 dbserver_patch_5.170131]# ssh root@myclustercel01-ilom
 Password:
 Oracle(R) Integrated Lights Out Manager
 Version 3.1.2.20.c r86871
 Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
 > start /sp/console
 Are you sure you want to start /SP/console (y/n)? y
 Serial console started. To stop, type ESC (

 
 

3.1.1 – Check the Version of Each Cell Before Patching

All versions have to be the same on each cell at this point.
If you are not confident with the cell_group, dbs_group, *_group files, please find the procedure to create them.

[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root imageinfo -ver
myclustercel01: 12.1.2.3.3.160119
myclustercel02: 12.1.2.3.3.160119
myclustercel03: 12.1.2.3.3.160119
myclustercel04: 12.1.2.3.3.160119
myclustercel05: 12.1.2.3.3.160119
myclustercel06: 12.1.2.3.3.160119
[root@myclusterdb01 ~]#

 
 

3.1.2 – Apply the Patch

A few notes:

  • You may use screen instead of nohup if it is installed
  • You can avoid the -patch_check_prereq step as it should has already been done previously, but I personally like to do it right before the patch to be absolutely sure.
  • You can also use the -smtp_to and the -smtp_from options to receive email notifications: -smtp_from “dba@pythian.com” -smtp_to “myteam@pythian.com dba@myclient.com”
  • Ensure you are connected on the database server node 1 (myclusterdb01)
[root@myclusterdb01 ~]# cd /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.3.161013/
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -reset_force
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -cleanup
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling
[root@myclusterdb01 ~]# nohup ./patchmgr -cells ~/cell_group -patch -rolling &
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -cleanup

You can then follow the patch in the nohup.out file (tail -f nohup.out). You can also check what is happening on the console or check in the patchmgr.out file.
 
 

Non-Rolling Manner

You may also want to apply this patch in a non-rolling manner. While this will be faster, it requires a complete downtime of all the databases running on this Exadata. To do so, you will have to stop the cluster and the cells and then remove the “-rolling” option from the previous patchmgr command line :
 

  • Stop the clusterware
  • [root@myclusterdb01 ~]#crsctl stop cluster -all
    [root@myclusterdb01 ~]#crsctl stop crs
    [root@myclusterdb01 ~]#crsctl check crs
    -- If the cluster is not stopped properly at this step, use the -f option : crsctl stop crs -f
    
  • Stop the cells
  • You can stop the cells services on the cells to be patched using the following command on each cell:

    [root@myclustercel01 ~]# cellcli -e 'alter cell shutdown services all'
    

    Or use the dcli command to launch it on all the cells

    [root@myclusterdb01 ~]# dcli -g ~/cell_group -l root "cellcli -e alter cell shutdown services all"
    
  • Apply the patch
  • [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -reset_force
    [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -cleanup
    [root@myclusterdb01 ~]# nohup ./patchmgr -cells ~/cell_group -patch &
    [root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -cleanup
    

 
 

3.1.3 – Check the Version of Each Cell After the Patch

All versions have to be the same on each cell at this point.

[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root imageinfo -ver
myclustercel01: 12.1.2.3.3.161013
myclustercel02: 12.1.2.3.3.161013
myclustercel03: 12.1.2.3.3.161013
myclustercel04: 12.1.2.3.3.161013
myclustercel05: 12.1.2.3.3.161013
myclustercel06: 12.1.2.3.3.161013
[root@myclusterdb01 ~]#

 
 
 

3.2 : Patching the IB Switches

3.2.0 – Information

  • Patching an IB Swicth takes around 45 minutes
  • All steps have to be executed as root
  • It is a 100% online operation
  • I’ve become accustomed to using the database node 1 (myclusterdb01) to patch the IB Switches, which is why I have deployed the root SSH keys from the DB node 1 to the IB Switches in the pre-requisites section
  • Nothing tricky here, we have never faced any issue.
  • Please find a procedure to create the ib_group file.

 
 

3.2.1 – Check the Version of Each IB Switch Before the Patch

[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root version | grep "version"
myclustersw-ib2: SUN DCS 36p version: 2.1.3-4
myclustersw-ib3: SUN DCS 36p version: 2.1.3-4
[root@myclusterdb01 ~]#

 
 

3.2.2 – Apply the Patch

A few notes:

  • You can use screen instead of nohup if it is installed on your system
  • Be sure to be connected to the myclusterdb01 server
[root@myclusterdb01 ~]# cd /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.3.161013/
[root@myclusterdb01 ~]# nohup ./patchmgr -ibswitches ~/ib_group -upgrade &

 
 

3.2.3 – Check the Version of Each IB Switch After the Patch

[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root version | grep "version"
myclustersw-ib2: SUN DCS 36p version: 2.1.8-1
myclustersw-ib3: SUN DCS 36p version: 2.1.8-1
[root@myclusterdb01 ~]#

 
 
 

3.3 Patching the DB nodes (aka Compute Nodes)

3.3.0 – Information

  • All actions must be done as root
  • Patching a database node takes around one hour
  • It is not possible to start the patch from a database node that will be patched (which makes sense). The official way to apply this patch in a rolling manner is to:
    • Start the patch from the database node 1 to patch all the other nodes
    • Once done, copy patchmgr and the ISO file to an already patched node and then start the patch to patch the remaining node (node 1 in my example)

    Since this doesn’t make sense to me (I still haven’t understood why Oracle recommends to do that), I use a cell to start the patch and can then patch all the database servers in one patchmgr session which seems more clever to me.

  • I use /tmp to save patchmgr and the ISO on the cell node 1 as /tmp exists on 100% of the Unix boxes and ensure that I can write in it. An important thing to know here is that /tmp on the cells is regularly purged as described in this documentation. The dbnodeupdate.zip file could then be deleted by this purge mechanism if there is too much time between the copy of it and when you use it and then you won’t be able to launch patchmgr as dbnodeupdate.zip is mandatory. There are few workarounds to that though:
    • Copy patchmgr and the ISO file just before you apply the patch (this is the solution I use)
    • Copy patchmgr and the ISO file outside of /tmp.
    • The directories with SAVE in the name are ignored, then you could create a /tmp/SAVE directory to put patchmgr and the ISO file in

 
 

3.3.1 – Check the Image Versions Before the Patch

In this step, we should find the same version on each node.

[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root imageinfo -ver
myclusterdb01: 12.1.2.2.1.160119
myclusterdb02: 12.1.2.2.1.160119
myclusterdb03: 12.1.2.2.1.160119
myclusterdb04: 12.1.2.2.1.160119
[root@myclusterdb01 ~]#

 
 

3.3.2 – Check Which Instance is Up and On Which Node

This is an important step. You have to know exactly what is running before proceeding to be sure that you will find the same status after the patch. You will then be able to follow the patching procedure with this script. You will then see all instances in a yellow “instance shutdown” status on the server that is being patched.

I use this script to have a clear status on which instance is running on which node; it will produce this kind of output :


 
 

3.3.3 – Apply the Patch

A few notes:

  • We first need to umount the NFS on each DB node, this is a pre-requisite of the patch
    You can generate all the umount commands with this command : “df -t nfs | awk ‘{if ($NF ~ /^\//){print “umount ” $NF}}’“; You can also do it automatically by adding a “| bash” at the end like that “df -t nfs | awk ‘{if ($NF ~ /^\//){print “umount ” $NF}}’ | bash”
  • You may use screen instead of nohup if it is installed
  • You can avoid the -patch_check_prereq step as it should has already been done previously but I personnally like to do it right before the patch to be 100% sure.
  • Be sure to be connected to the cell node 1 (myclustercel01)

 
 

Copy patchmgr and the ISO

Whether you choose a rolling or a non-rolling manner, you have to copy patchmgr and the ISO file on the cell node 1 first (do not unzip the ISO file).

[root@myclusterdb01 ~]#  scp /patches/OCT2016_bundle_patch/24436624/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/5.161014/p21634633_121233_Linux-x86-64.zip root@myclustercel01:/tmp/.                    # This is patchmgr
[root@myclusterdb01 ~]#  scp /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataDatabaseServer_OL6/p24669306_121233_Linux-x86-64.zip root@myclustercel01:/tmp/.                               # This is the ISO file, do NOT unzip it
[root@myclusterdb01 ~]#  ssh root@myclustercel01
[root@myclustercel01 ~]#  cd /tmp
[root@myclustercel01 ~]#  nohup unzip p21634633_121233_Linux-x86-64.zip &

 
 

Rolling Manner

The rolling manner will allow you to patch every node one by one. You will always have only one node unavailable, all the other nodes will remain up and running. This method of patching is almost online and could be 100% online with a good service rebalancing

[root@myclustercel01 ~]# cd /tmp/dbserver_patch_5.161014
[root@myclustercel01 dbserver_patch_5.161110]# ./patchmgr -dbnodes ~/dbs_group -precheck -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013
[root@myclustercel01 dbserver_patch_5.161110]# nohup ./patchmgr -dbnodes ~/dbs_group -upgrade -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013 -rolling &

 
 

Non-Rolling Manner

In a non-rolling manner, patchmgr will patch all the nodes at the same time in parallel. It will then be quicker, but a whole downtime is required.

[root@myclustercel01 ~]# cd /tmp/dbserver_patch_5.161014
[root@myclustercel01 dbserver_patch_5.161110]# ./patchmgr -dbnodes ~/dbs_group -precheck -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013
[root@myclustercel01 dbserver_patch_5.161110]# nohup ./patchmgr -dbnodes ~/dbs_group -upgrade -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013 &

 
 

3.3.4 – Check the Image Version on Each Node

Be sure that everything is working well after the patch and that the expected version has been installed correctly:

[root@myclustercel01 ~]# dcli -g ~/dbs_group -l root imageinfo -ver
myclusterdb01: 12.1.2.3.3.161013
myclusterdb02: 12.1.2.3.3.161013
myclusterdb03: 12.1.2.3.3.161013
myclusterdb04: 12.1.2.3.3.161013
[root@myclusterdb01 ~]#

3.3.5 – Check the Status of Each Instance on Each Node

Like in step 3.2.2, I use this script to have a clear status on which instance is running on which node. We need to have the exact same status as before the application of the patch.
 
All the infrastructure components are patched (Cells, DB Nodes and IB Switches), so we can now continue with the software components (Grid and Databases ORACLE_HOME) patching in the Part 3 of this blog.
 


Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 
01:30 How to Patch an Exadata (Part 1) – Introduction and Prerequisites (21701 Bytes) » Official Pythian Blog

Once you have installed your new Exadata machine will come a time where you’ll be asked :

shouldn’t we patch the Exadata” ?

And the answer is “yes, definitely“.

 
Indeed, Oracle releases huges (~ 10 GB) “Quarterly Full Stack” patches (aka Bundles) every quarter (for example : Patch 24436624 – Quarterly Full Stack Download For Oracle Exadata (Oct 2016 – 12.1.0.2)); these Bundles contain all the patches for all the components that make an Exadata. You will need (almost :)) nothing else to be able to patch your whole Exadata.
 
Even if it looks a tough operation at first sight, it is not that much. And this blog’s aim is to clearly describe every step to make it easier for all of us. Let’s start with a preview of this patching with the order we will be proceeding and the tools we will be using :
 
 


 
 
As it is quite a long odyssey, I will split this blog in different parts which are also a logic order to patch all the components :
 
0/ An advice

1/ General Information

2/ Some prerequisites it is worth doing before the maintenance

3/ The patching procedure

3.1/ Patching the cells (aka Storage servers)
3.2/ Patching the IB switches
3.3/ Patching the Database servers (aka Compute Nodes)
3.4/ Patching the Grid Infrastructure
3.5/ Patching the databases ORACLE_HOMEs

4/ The Rollback procedure

4.1/ Cell Rollback
4.2/ DB nodes Rollback
4.3/ IB Switches Rollback

5/ Troubleshooting

5.1/ Cell patching issue
5.2/ CRS does not restart issue
5.3/ A procedure to add instances to a database
5.4/ OPatch resume

6/ Timing

 


 
 
 

0/ An advice

First of all, please strongly keep in mind this advice :

Do NOT continue to the next step before a failed step is properly resolved.

Indeed, everything that needs to be redundant is redundant and it is supported to run different versions between servers. In the MOS note “Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)“, we can read that :
 

It is supported to run different Exadata versions between servers. For example, some storage servers may run 11.2.2.4.2 while others run 11.2.3.1.1, or all storage servers may run 11.2.3.1.1 while database servers run 11.2.2.4.2. However, it is highly recommended that this be only a temporary configuration that exists for the purpose and duration of rolling upgrade.

 
Then if when patching your cells one cell is not rebooting, stop here, do not continue, do not force patch the next one. Indeed, everything will still be working fine and in a supported manner with one cell down (I did it on production, no user could notice anything), it will most likely not be the case with 2 cells down. If this kind of issue happens, have a look at the troubleshooting section of this blog and open a MOS Sev 1.

 
 
 

1/ General Information

Some information you need to know before starting to patch your Exadata :

  • It is better to have a basic understanding of what is an Exadata before jumping to this patch procedure
  • This procedure does not apply to an ODA (Oracle Database Appliance)
  • I will use the /patches/OCT2016_bundle_patch FS to save the Bundle in the examples of this blog
  • I use the “DB node” term here, it means “database node“, aka “Compute node“; the nodes where the Grid Infrastructure and the database are running, I will also use the db01 term for the database node number 1, usually named db01
  • I use the “cell” word aka “storage servers“, the servers that manage your storage. I will also use cel01 for the storage server number 1, usually named cel01
  • It is good to have the screen utility installed; if not, use nohup
  • Almost all the procedure will be executed as root
  • I will patch the IB Switches from the DB node 1 server
  • I will patch the cells from the DB node 1 server
  • I will patch the DB nodes from the cel01 server

 
 
 

1/ Some prerequisites it is worth doing before the maintenance

1.1/ Download and unzip the Bundle

Review the Exadata general note (Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)) to find the latest Bundle, download, unzip it; be sure that every directory is owned by oracle:dba to avoid any issue in the future :

/patches/OCT2016_bundle_patch >
oracle@myclusterdb01) ls -ltr
total 9609228
-rw-r--r-- 1 oracle oinstall 560430690 Nov 16 18:24 p24436624_121020_Linux-x86-64_10of10.zip
-rw-r--r-- 1 oracle oinstall 1030496554 Nov 16 18:26 p24436624_121020_Linux-x86-64_1of10.zip
-rw-r--r-- 1 oracle oinstall 1032681260 Nov 16 18:27 p24436624_121020_Linux-x86-64_2of10.zip
-rw-r--r-- 1 oracle oinstall 1037111138 Nov 16 18:29 p24436624_121020_Linux-x86-64_3of10.zip
-rw-r--r-- 1 oracle oinstall 1037009057 Nov 16 18:31 p24436624_121020_Linux-x86-64_4of10.zip
-rw-r--r-- 1 oracle oinstall 1037185003 Nov 16 18:33 p24436624_121020_Linux-x86-64_5of10.zip
-rw-r--r-- 1 oracle oinstall 1026218494 Nov 16 18:35 p24436624_121020_Linux-x86-64_6of10.zip
-rw-r--r-- 1 oracle oinstall 1026514887 Nov 16 18:36 p24436624_121020_Linux-x86-64_7of10.zip
-rw-r--r-- 1 oracle oinstall 1026523343 Nov 16 18:39 p24436624_121020_Linux-x86-64_8of10.zip
-rw-r--r-- 1 oracle oinstall 1025677014 Nov 16 18:41 p24436624_121020_Linux-x86-64_9of10.zip

/patches/OCT2016_bundle_patch >
oracle@myclusterdb01) for I in `ls p24436624_121020_Linux-x86-64*f10.zip`
do
unzip $I
done
Archive: p24436624_121020_Linux-x86-64_10of10.zip
 inflating: 24436624.tar.splitaj
...
Archive: p24436624_121020_Linux-x86-64_9of10.zip
 inflating: 24436624.tar.splitai

/patches/OCT2016_bundle_patch >
oracle@myclusterdb01) cat *.tar.* | tar -xvf -
24436624/
24436624/automation/
24436624/automation/bp1-out-of-place-switchback.xml
24436624/automation/bp1-auto-inplace-rolling-automation.xml

...

 
 
 

1.2/ SSH keys

For this step, if you are not confident with the dbs_group, cell_group, etc… files,  here is how to create them as I have described it in this post (look for “dbs_group” in the post).

[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group
[root@myclusterdb01 ~]# cat /root/dbs_group ~/cell_group > /root/all_group
[root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group
[root@myclusterdb01 ~]#

We would need few SSH keys deployed in order to ease the patches application :

  • root ssh keys deployed from the db01 server to the IB Switches (you will have to enter the root password once for each IB Switch)
[root@myclusterdb01 ~]# cat ~/ib_group
myclustersw-ib2
myclustersw-ib3
[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustersw-ib3's password:
root@myclustersw-ib2's password:
myclustersw-ib2: ssh key added
myclustersw-ib3: ssh key added
[root@myclusterdb01 ~]#
  • root ssh keys deployed from the cel01 server to all the database nodes (you will have to enter the root password once for each database server)
[root@myclustercel01 ~]# cat ~/dbs_group
myclusterdb01
myclusterdb02
myclusterdb03
myclusterdb04
[root@myclustercel01 ~]# dcli -g ~/dbs_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclusterdb01's password:
root@myclusterdb03's password:
root@myclusterdb04's password:
root@myclusterdb02's password:
myclusterdb01: ssh key added
myclusterdb02: ssh key added
myclusterdb03: ssh key added
myclusterdb04: ssh key added
[root@myclustercel01 ~]#
  • root ssh keys deployed from the db01 server to all the cells (you will have to enter the root password once for each cell)
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root hostname
myclustercel01: myclustercel01.mydomain.com
myclustercel02: myclustercel02.mydomain.com
myclustercel03: myclustercel03.mydomain.com
myclustercel04: myclustercel04.mydomain.com
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustercel04's password:
...
root@myclustercel03's password:
myclustercel01: ssh key added
...
myclustercel06: ssh key added
[root@myclusterdb01 ~]#

 
 
 

1.3/ Upgrade opatch

It is highly recommended to upgrade opatch before any patching activity and this Bundle is not an exception. Please find the procedure to quickly upgrade opatch with dcli in this post.

Please note that upgrading opatch will also allow you to be ocm.rsp-free !

 
 
 

1.4/ Run the prechecks

It is very important to run those prechecks and take a good care of the outputs. They have to be 100% successful to ensure a smooth application of the patches.

  • Cell patching prechecks (launch them from the DB Node 1 as you will patch them from here)
[root@myclusterdb01 ~]# cd /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.3.161013/
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling

 
 

  • DB Nodes prechecks (launch them from the cel01 server as you will patch them from here)

As we will use the cell node 1 server to patch the databases servers, we first need to copy patchmgr and the ISO file to this server

[root@myclusterdb01 ~]#  scp /patches/OCT2016_bundle_patch/24436624/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/5.161014/p21634633_121233_Linux-x86-64.zip root@myclustercel01:/tmp/.                    # This is patchmgr
[root@myclusterdb01 ~]#  scp /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataDatabaseServer_OL6/p24669306_121233_Linux-x86-64.zip root@myclustercel01:/tmp/.                               # This is the ISO file, do NOT unzip it
[root@myclusterdb01 ~]#  ssh root@myclustercel01
[root@myclustercel01 ~]#  cd /tmp
[root@myclustercel01 ~]#  nohup unzip p21634633_121233_Linux-x86-64.zip &
[root@myclustercel01 ~]# cd /tmp/dbserver_patch_5.5.161014
[root@goblxdex02cel01 dbserver_patch_5.161014]# ./patchmgr -dbnodes ~/dbs_group -precheck -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013
[root@goblxdex02cel01 dbserver_patch_5.161014]#

Note : if you have some NFS mounted, you will have some error messages, you can ignore them at this stage, we will umount the NFS before patching the DB nodes
 
 

  • IB Switches prechecks (launch them from the DB Node 1 as you will patch them from here)
[root@myclusterdb01]# cd /patches/OCT2016_bundle_patch/24436624/Infrastructure/12.1.2.3.3/ExadataStorageServer_InfiniBandSwitch/patch_12.1.2.3.3.161013/
[root@myclusterdb01]# patch_12.1.2.3.3.161013]# ./patchmgr -ibswitches ~/ib_group -upgrade -ibswitch_precheck

 
 

  • Grid Infrastructure prechecks
[root@myclusterdb01]# . oraenv <<< +ASM1
[root@myclusterdb01]# $ORACLE_HOME/OPatch/opatchauto apply /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103 -oh /u01/app/12.1.0.2/grid -analyze
[root@myclusterdb01]#

Notes :

  • You will most likely see some warnings here, check the logfiles and they will probably be due to some patches that will be rolled back as they will not be useful any more.

 
 

[root@myclusterdb01]# $ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_gihome.txt

With the following /tmp/patch_list_gihome.txt file (check the README as the patch numbers will change with the versions)

[root@myclusterdb01]#cat /tmp/patch_list_gihome.txt
/patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/21436941
/patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24007012
/patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24846605
/patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24340679
[root@myclusterdb01]#

 
 

  • Database patch prechecks
[oracle@myclusterdb01]$ . oraenv <<< A_DATABASE_WITH_THE_ORACLE_HOME_YOU_WANT_TO_PATCH
[oracle@myclusterdb01]$ $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24340679 $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24846605 $ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -phBaseFile /tmp/patch_list_dbhome.txt

The file /tmp/patch_list_dbhome.txt containing (check the README, the patch numbers will change depending on the versions) :

/patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24340679
/patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018DBBP/24448103/24846605

 
 

  • OJVM prechecks
[oracle@myclusterdb01]$ cd /patches/OCT2016_bundle_patch/24436624/Database/12.1.0.2.0/12.1.0.2.161018OJVMPSU/24315824
[oracle@myclusterdb01]$ $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./
[oracle@myclusterdb01]$

- Do a lsinventory -all_nodes before patching and save the output somewhere

[oracle@myclusterdb01]$ $ORACLE_HOME/OPatch/opatch lsinventory -all_nodes

 
 

  • Check disk_repair_time and set it to 24h

Oracle recommends to set this parameter to 8h. As we had issues in the past with a very long cell patching, we now use to set this parameter to 24h as Oracle has recommended us.
Please note that this prerequisite is only needed for a rolling patch application.

SQL> select dg.name as diskgroup, a.name as attribute, a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and (a.name like '%repair_time' or a.name = 'compatible.asm');

DISKGROUP ATTRIBUTE VALUE
-------------------- ---------------------------------------- ----------------------------------------
DATA disk_repair_time 3.6h
DATA compatible.asm 11.2.0.2.0
DBFS_DG disk_repair_time 3.6h
DBFS_DG compatible.asm 11.2.0.2.0
RECO_ disk_repair_time 3.6h
RECO compatible.asm 11.2.0.2.0

6 rows selected.

SQL> connect / as sysasm
Connected.
SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'disk_repair_time' = '24h' ;

Diskgroup altered.

SQL> ALTER DISKGROUP DBFS_DG SET ATTRIBUTE 'disk_repair_time' = '24h' ;

Diskgroup altered.

SQL> ALTER DISKGROUP RECO SET ATTRIBUTE 'disk_repair_time' = '24h' ;

Diskgroup altered.

SQL>

 
If one of this precheck points a problem, resolve it before heading to the next steps.
 
 
Now that everything is downloaded, unzipped, updated, we can safely jump to the patching procedure in part 2 !
 
 


Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

 

2017-03-27 Mon

17:04 ADAPTIVE LOG FILE SYNC 引起的高Log File Sync警示 (2987 Bytes) » Oracle Life

作者:eygle 发布在 eygle.com

关于 Log File Sync 等待的优化,在Oracle数据库中一直是常见问题,LOG FILE的写出性能一旦出现波动,该等待就可能十分突出。

在Oracle 11.2.0.3 版本中,Oracle 将隐含参数 _use_adaptive_log_file_sync 的初始值设置为 TRUE,由此带来了很多 Log File Sync 等待异常的情况,这个问题虽然由来已久,但是仍然有很多Oracle的用户并不知情。所以我写下这个条目,希望让更多的朋友可以看到。

  • 当前台进程提交事务(commit)后,LGWR需要执行日志写出操作,而前台进程因此进入 Log File Sync 等待周期。
  • 在以前版本中,LGWR 执行写入操作完成后,会通知前台进程,这也就是 Post/Wait 模式;

在11gR2 中,为了优化这个过程,前台进程通知LGWR写之后,可以通过定时获取的方式来查询写出进度,这被称为 Poll 的模式,在11.2.0.3中,这个特性被默认开启。

这个参数的含义是:数据库可以在自适应的在 post/wait 和 polling 模式间选择和切换。

  • _use_adaptive_log_file_sync , Adaptively switch between post/wait and polling

这是由于这个原因,带来了很多Bug,反而使得 Log File Sync 的等待异常的高,如果你在 11.2.0.3 版本中观察到这样的表征,那就极有可能与此有关。

如果是这样,将 _use_adaptive_log_file_sync 参数设置为 False,回归以前的模式,将会有助于问题的解决。

MOS上的这些文档,可以供您参考:

  • Document 1462942.1 Adaptive Switching Between Log Write Methods can Cause 'log file sync' Waits
  • Document 13707904.8 Bug 13707904 - LGWR sometimes uses polling, sometimes post/wait
  • Document 13074706.8 Bug 13074706 - Long "log file sync" waits in RAC not correlated with slow writes

相关文章|Related Articles