AWS – RDS – SQL Server Native backups

June 4, 2020, 3:22 pm

≫ Next: DB-Upgrade hangs in SE2 waiting on Streams AQ while gathering statistics on LOGMNR-Tables

≪ Previous: Oracle 18c – select from a flat file

Introduction

RDS provide automatic backup feature. This feature will backup the entire RDS instance. As a DBA sometimes you need to backup individual database, they are many reason for that (legal, migration, replication,…). Unfortunately individual database backups are not available within the RDS instance.

This post explains how you can enable the native database backup on RDS , as you are used to with an on-premises SQL Server instance.
To summarize we will create a S3 bucket on AWS to store the backups, create IAM role having the mandatory permission on the S3 bucket, create an RDS Option Group associated with the role and having the SQLSERVER_BACKUP_RESTORE option.

Of course you need to have a existing RDS instance running. I have one with a SQL Server 2017 EE.

Create a S3 bucket

If you do not already have one, create first a S3 bucket that will be your repository for the saving your database backups.
Open your s3 management console and click [Create bucket]

Enter a S3 bucket name and select the region where your RDS instance is located.
When done click [Create]

Create a IAM role

Open the AWS IAM management console and select [Policies} in the navigation pane and click [Create policy]

Select the Json tab and copy the following script to replace the existing one

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor0",
			"Effect": "Allow",
			"Action": [
			"s3:ListBucket",
			"s3:GetBucketLocation"
			],
			"Resource": "arn:aws:s3:::dbi-sql-backup"
		},
		{
			"Sid": "VisualEditor1",
			"Effect": "Allow",
			"Action": [
			"s3:PutObject",
			"s3:GetObject",
			"s3:AbortMultipartUpload",
			"s3:ListMultipartUploadParts"
			],
			"Resource": "arn:aws:s3:::dbi-sql-backup/*"
		}
	]
}

In the script do not forget to replace the S3 bucket name with the one you created previously and click [Review policy]

Set a name for your policy and finish the creation with [Create policy]

Now, select [Roles] in the navigation pane. Click the option [Create role]

Select the [AWS service] option, then the [EC2] and finally again [EC2] in the use case list and click [Next:Permissions]

Search the policy you create and select it. Then click the [Next:Tags]. I recommend to capture some tags, to later be able to identify it easier. Click then [Next:Review]

Key in your role name and finally press [Create Role]

When created edit again the role by selecting it in the role list

Select the [Trusted relationship] tab and edit it.

Replace the line “Service”: “ec2.amazonaws.com” with “Service”: “rds.amazonaws.com” as we want this role to be active in RDS.
Click [Update Trust Policy]

Create an Option Group

Next step is to create an option group.
To do so, open the RDS management console and select [Option groups] in the navigation pane and click on [Create group]

Set the name of your option group and select the engine version and Major Engine Version of your RDS instance and create your option group.

Now select your option group in the list and add an option.

In the option detail section select SQLSERVER_BACKUP_RESTORE and in the IAM section select the role you created previously. In the scheduling option choose the option you want. In my case I want it to be applied immediately. Then click [Add option].

Link your RDS instance with your option group

The last configuration step is to couple your RDS instance with the create option group

In the RDS management console select [Databases] in the navigation pane and select your RDS instance for which you need native backup to be activated. Make sure a well that the version of the RDS instance match the one set in the option group you created previously. Check if you instance is available and then click [Modify]

Scroll down until the Database options section. In the option group combo box select the option group your created previously and then click [Continue] at the bottom of the page.

Choose when you want to apply the modification. Be aware that if you select “Apply immediately” your RDS instance will restart and you will have an interruption of the service.

Test the Backup

Connect to you RDS SQL Server instance using for exemple Microsoft SQL Sever Management Studio.
There is a stored procedure in the [msdb] named [dbo].[rds_backup_database] that you must use to start your native database backup

USE [msdb]

EXECUTE [dbo].[rds_backup_database]
@source_db_name = 'ProductDB'
,@s3_arn_to_backup_to = 'arn:aws:s3:::dbi-sql-backup/awsvdata_ProductDB.bak'
--,@kms_master_key_arn
--,@overwrite_s3_backup_file
--,@type
--,@number_of_files
GO

Adapt the script with your database name and the path of your S3 bucket with the backupfile name. Note that I did not used all parameters of the stored procedure in this post.
The result of the stored procedure execution will give you a task_id associated with your command.

With the task_if, you can follow up the status of the process with the following stored procedure:

Use [msdb]
execute rds_task_status @task_id = 4

Conclusion

Enabling Native database backup is indeed very practical. Unfortunately there are some limitations.

For instance there is no differential, transaction log or filegroup backup or restore possibility that could be very useful in many scenario.

Cet article AWS – RDS – SQL Server Native backups est apparu en premier sur Blog dbi services.

↧

DB-Upgrade hangs in SE2 waiting on Streams AQ while gathering statistics on LOGMNR-Tables

June 5, 2020, 12:51 pm

≫ Next: What is a serverless database?

≪ Previous: AWS – RDS – SQL Server Native backups

A couple of weeks ago I upgraded an Oracle Standard Edition 2 test database from 12.1.0.2 to 12.2.0.1 (with the April 2020 Patch Bundle) on Windows. Recently I upgraded the production database. Both upgrades were done with the Database Upgrade Assistant DBUA. I didn’t use AUTOUPGRADE because I had to upgrade only 1 database and the DBUA handles everything for me (including changing the necessary Windows services and update the timezone file).

Both upgrades did hang at the finalizing phase of the components upgrade.

So I checked what the upgrade process is waiting for in the DB:


SQL> select sid, sql_id, event,p1,p2,p3 from v$session 
   2 where status='ACTIVE' and type='USER' and sid not in 
   3 (select sid from v$mystat);

       SID SQL_ID        EVENT                                                 P1         P2         P3
---------- ------------- -------------------------------------------------- ----- ---------- ----------
      1142 fgus25bx1md8q Streams AQ: waiting for messages in the queue      17409 1.4072E+14 2147483647

SQL> set long 400000 longchunksize 200
SQL> select sql_fulltext from v$sqlarea where sql_id='fgus25bx1md8q';

SQL_FULLTEXT
---------------------------------------------------------------------------------
DECLARE
        cursor table_name_cursor  is
                select  x.name table_name
                from sys.x$krvxdta x
                where bitand(x.flags, 12) != 0;
        filter_lst DBMS_STATS.OBJECTTAB := DBMS_STATS.OBJECTTAB();
        obj_lst    DBMS_STATS.OBJECTTAB := DBMS_STATS.OBJECTTAB();
        ind number := 1;
BEGIN
   for rec in table_name_cursor loop
      begin
        filter_lst.extend(1);
        filter_lst(ind).ownname := 'SYSTEM';
        filter_lst(ind).objname := 'LOGMNR_'|| rec.table_name||'';
        ind := ind + 1;
      end;
   end loop;
   DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME=>'SYSTEM', objlist=>obj_lst, obj_filter_list=>filter_lst);
END;

So obviously the upgrade process tried to gather stats on LOGMNR-tables owned by SYSTEM and waits for messages in the scheduler queue SCHEDULER$_EVENT_QUEUE (Object ID 17409). I.e. this is something similar as documented in MOS Note 1559487.1.

The upgrade was stuck at this point. So what to do?

Fortunately I remembered a blog about DBUA being restartable in 12.2. from Mike Dietrich:

Restarting a failed Database Upgrade with DBUA 12.2

So I killed the waiting session:


SQL> select serial# from v$session where sid=1142;

   SERIAL#
----------
     59722

SQL> alter system kill session '1142,59722';

System altered.

Then I let the DBUA run into tons of errors and let it finish his work. To restart it I just clicked on “Retry” in the GUI. After some time DBUA went into an error again. I quickly checked the log-files and clicked again on “Retry”. That time it went through without issues. Checking the log-files and the result of the upgrade showed all components migrated correctly.

So in summary: A failed upgrade (crashed or hanging) with DBUA is not such a bad thing anymore as it was before 12.2. You can just let DBUA (or AUTOUPGRADE) retry its work. Of course, usually you have to fix the reason for the failure before restarting/retrying.

REMARK: See Mike Dietrich’s Blog about resumability and restartability of Autoupgrade here:

Troubleshooting, Restoring and Restarting AutoUpgrade

Cet article DB-Upgrade hangs in SE2 waiting on Streams AQ while gathering statistics on LOGMNR-Tables est apparu en premier sur Blog dbi services.

↧

What is a serverless database?

June 5, 2020, 1:44 pm

≫ Next: Oracle 12c – peak detection with MATCH_RECOGNIZE

≪ Previous: DB-Upgrade hangs in SE2 waiting on Streams AQ while gathering statistics on LOGMNR-Tables

By Franck Pachot

.
After reading the https://cloudwars.co/oracle/oracle-deal-8×8-larry-ellison-picks-amazons-pocket-again/ paper, I am writing some thoughts about how a database can be serverless and elastic. Of course, a database needs a server to process its data. Serverless doesn’t mean that there are no servers.

Serverless as not waiting for server provisioning

The first idea of “serverless” is about provisioning. In the past when a developer required a new database to start a new project she had to wait that a server is installed. In 1996 my first development on Oracle Database started like this: we asked Sun for a server and OS and asked Oracle for the database software, all for free for a few months, in order to start our prototype. Today this would be a Cloud Free Tier access. At that time we had to wait to receive, unbox, and install all this. I learned a lot there about Installing an OS, configuring the network, setting up disk mirroring… This was an awesome experience for a junior starting in IT. Interestingly, I think that today a junior can learn the same concepts with a Cloud Foundation training and certification. This has not really changed except the unboxing and cabling. The big difference is that today we do not have to wait weeks for it and can setup the same infrastructure in 10 minutes.

That was my first DevOps experience: we wanted to develop our application without waiting for the IT department. But it was not serverless at all.

A few years later I was starting a new datawarehouse for a mobile telco in Africa. Again, weeks to months were required to order and install a server for it. And we didn’t wait. We started the first version of the datawarehouse on a spare PC we had. This was maybe my first serverless experience: the server provisioning is out of the critical path in the project planning. Of course, a PC is not a server and reliability and performance were not there. But we were lucky and when the server arrived we already had good feedback from this first version.

We need serverless, but we need real servers behind it. Today, this is possible: you don’t need to wait and you can provision a new database in the public or private cloud, or simply on a VM, without waiting. And all security, reliability and performance are there. With Oracle, it is a bit more difficult if you can’t do it in their public cloud because licensing do not count vCPUs and you often need specific hardware for it like in the old days. Appliances like ODA can help. Public Cloud or Cloud@Customer definitely helps.

Serverless as not taking responsibility for server administration

Serverless is not only about running on virtual servers with easy provisioning. If you are serverless, you don’t want to manage those virtual machines. You start and connect to a compute instance. You define its shape (CPU, RAM) but you don’t want to know where it runs physically. Of course, you want to define the region for legal, performance or cost reasons, but not which data center, which rack,… That’s the second step of serverless: you don’t manage the physical servers. In Oracle Cloud, you run a Compute Instance where you can install a database. In AWS this is an EC2 instance where you can install a database.

But, even if you don’t own the responsibility of the servers, this is not yet “serverless”. Because you pay for them. If your CFO still sees a bill for compute instance, you are not serverless.

Serverless as not paying for the server

AWS has a true serverless and elastic database offer: Amazon Aurora Serverless. You don’t have to start or stop the servers. This is done automatically when you connect. More activity adds more servers. No connection stops it. And you pay only for what the application is using. You don’t pay for the database servers running. You really pay for what the application is using.

Azure has also a Serverless SQL Server: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-serverless

Those are, as far as I know, the only true serverless databases yet. If we need to stop and start the compute services ourselves, even with some level of auto-scaling, we can call that on-demand but not serverless.

All AWS RDS services including Aurora can be started and stopped on demand. They can scale up or down with minimal downtime, especially in Multi-AZ because the standby can be scaled and activated. Redshift cannot be stopped because it uses local storage. But you can take a snapshot and terminate the instance, and restore it later.

On Oracle side, the Autonomous Database can be stopped and started. Then again, we can say that we don’t pay when we don’t use the database but cannot say that we don’t pay when we don’t use the application. Because the database is up even if the application is not used. However, you can scale without the need to stop and start. And there’s also some level of auto-scaling where the additional application usage is really billed on CPU usage metrics: you pay for n OCPUs when the ATP or ADB is up and you can use up to n*3 sessions on CPU, with true serverless billing for what is above the provisioned OCPUs. Maybe the future will go further. The technology allows it: multitenant allows PDB level CPU caging where the capacity can be changed online (setting CPU_COUNT) and AWR gathers the CPU load with many metrics that can be used for billing.

Serverless

The name is funny because serverless programs run on servers. And the crush for running without servers is paradoxical. When I started programming, it was on very small computers (ZX-81, Apple //e, IBM PC-XT) and I was really proud when I started to do real stuff running on real servers, with a schema on *the* company database. Actually, what is called serverless today is, in my opinion, showing the full power of servers: don’t need to buy a computer for a project but use some mutualized compute power.

The cloud wars use strange marketing terms, but really good technology and concepts are coming.

Cet article What is a serverless database? est apparu en premier sur Blog dbi services.

↧

Oracle 12c – peak detection with MATCH_RECOGNIZE

June 8, 2020, 7:39 am

≫ Next: Oracle 12c – reorg and split table with clustering

≪ Previous: What is a serverless database?

By Franck Pachot

.
This post is part of a series of small examples of recent features. I’m running this in the Oracle 20c preview in the Oracle Cloud. I’ll show a very basic example of “Row Pattern Recognition” (the MATCH_RECOGNIZE clause in a SELECT which is documented as “row pattern matching in native SQL” feature by Oracle”). You may be afraid of those names. Of course, because SQL is a declarative language there is a small learning curve to get beyond this abstraction. Understanding procedurally how it works may help. But when you understand the declarative nature it is really powerful. This post is there to start simple on a simple table with time series where I just want to detect peaks (the points where the value goes up and then down).

Historically, a SELECT statement was operating on single rows (JOIN, WHERE, SELECT) within a set, or an aggregation of rows (GROUP BY, HAVING) to provide a summary. Analytic functions can operate on windows of rows (PARTITION BY, ORDER BY, ROWS BETWEEN,…) where you keep the detailed level or rows and compare it to the aggregated values of the group. A row can then look at its neighbours and when needing to go further, the SQL MODEL can build the equivalent of spreadsheet cells to reference other rows and columns. As in a spreadsheet, you can also PIVOT to move row detail to columns or vice versa. All that can be done in SQL, which means that you don’t code how to do it but just define the result you want. However, there’s something that is easy to do in a spreadsheet application like Excel but not easy to code with analytic functions: looking at a Chart, as a Line Graph, to detect some behaviour. That’s something we can code in SQL with MATCH_RECOGNIZE.

For example, from the “COVID” table I have imported in the previous post I want to see each peak of covid-19 cases in Switzerland:

I did this manually in Excel: showing all labels but keeping only those that are at a peak, whether it is a small peak or high one. There’s one value per day in this timeseries but I’m am not interested by the intermediate values. Only peaks. So, this was done from the .csv imported from http://opendata.ecdc.europa.eu/covid19/casedistribution/csv/ through an external table but, as I imported it into an Oracle table for the previous post (Oracle 18c – select from a flat file).

Ok, let’s show directly the result. Here is a small SQL statement that show me exactly those peaks, each match being numbered:


SQL> select countriesandterritories "Country","Peak date","Peak cases","match#"
  2  from covid
  3  match_recognize (
  4   partition by continentexp, countriesandterritories order by daterep
  5   measures
  6    match_number() as "match#",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   one row per match
 10   pattern (GoingUp+ GoingDown+)
 11   define
 12    GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
 14  )
 15  where countriesandterritories='Switzerland';

       Country    Peak date    Peak cases    match#
______________ ____________ _____________ _________
Switzerland    26-FEB-20                1         1
Switzerland    28-FEB-20                7         2
Switzerland    07-MAR-20              122         3
Switzerland    09-MAR-20               68         4
Switzerland    14-MAR-20              267         5
Switzerland    16-MAR-20              841         6
Switzerland    18-MAR-20              450         7
Switzerland    22-MAR-20             1237         8
Switzerland    24-MAR-20             1044         9
Switzerland    28-MAR-20             1390        10
Switzerland    31-MAR-20             1138        11
Switzerland    03-APR-20             1124        12
Switzerland    08-APR-20              590        13
Switzerland    10-APR-20              785        14
Switzerland    16-APR-20              583        15
Switzerland    18-APR-20              346        16
Switzerland    20-APR-20              336        17
Switzerland    24-APR-20              228        18
Switzerland    26-APR-20              216        19
Switzerland    01-MAY-20              179        20
Switzerland    09-MAY-20               81        21
Switzerland    11-MAY-20               54        22
Switzerland    17-MAY-20               58        23
Switzerland    21-MAY-20               40        24
Switzerland    24-MAY-20               18        25
Switzerland    27-MAY-20               15        26
Switzerland    29-MAY-20               35        27
Switzerland    06-JUN-20               23        28


28 rows selected.

Doing that with analytic functions or MODEL clause is possible, but not easy.

So let’s explain the clauses in this simple example.

Define

I’ll need to define what is a peak. For that, I need to define two very primary patterns. The value I’m looking for, which is the one you see on the graph, is the column “CASES”, which is the number of covid-19 cases for the day and country. How do you detect peaks visually? Like when hiking in mountains: it goes up and when you continue it goes down. Here are those two primary patterns:


 11   define
 12    GoingUp as ( GoingUp.cases >= prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))

“GoingUp” matches a row where “cases” value is higher than the preceding row and “GoingDown” matches a row where “cases” is lower than the preceding one. The sense of “preceding one”, of course, depends on an order, like with analytic functions. We will see it below.

Pattern

A peak is when a row matches GoingDown just after matching GoingUp. That’s simple but you can imagine crazy things that a data scientist would want to recognize. And then the MATCH_RECOGNIZE defines patterns in a similar way as Regular Expressions: mentioning the primary patterns in a sequence with some modifiers. Mine is so simple:


 10   pattern (GoingUp+ GoingDown+)

This means: one or more GoingUp followed by one or more GoingDown. This is exactly what I did in the graph above: ignore intermediate points. So, the primary pattern compares a row with the preceding only and consecutive comparisons are walked through and compared with the pattern.

Partition by

As mentioned, I follow the rows in order. For a timeseries, this is simple: the key is the country here, I partition by continent and country, and the order (x-axis) is the date. I’m looking at the peaks per country when the value (“cases”) is ordered by date (“daterep”):


  2  from covid
...
  4   partition by continentexp, countriesandterritories order by daterep
...
 15* where countriesandterritories='Switzerland';

I selected only my country here with a standard where clause, to show simple things.

Measures

Eatch time a pattern is recognized, I want to display only one row (“ONE ROW PER MATCH”) with some measures for it. Of course, I must access to the point I’m interested in: the x-axis date and y-axis value for it. I can reference points within the matching window and I use the pattern variables to reference them. The peak is the last row in the “GoingUp” primary pattern and last(GoingUp.dateRep) and last(GoingUp.cases) are my points:


  5   measures
  6    match_number() as "match#",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   one row per match

Those measures are accessible in the SELECT clause of my SQL statement. I added the match_number() to identify the points.

Here is the final query, with the partition, measures, pattern and define clauses within the MATCH_RECOGNIZE():


select countriesandterritories "Country","Peak date","Peak cases","match#"
from covid
match_recognize (
 partition by continentexp, countriesandterritories order by daterep
 measures
  match_number() as "match#",
  last(GoingUp.dateRep) as "Peak date",
  last(GoingUp.cases) as "Peak cases"
 one row per match
 pattern (GoingUp+ GoingDown+)
 define
  GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
  GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
)
where countriesandterritories='Switzerland';

The full syntax can have more and of course all is documented: https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8982

Debug mode

In order to understand how it works (and debug) we can display “all rows” (ALL ROWS PER MATCH instead of ONE ROW PER MATCH in line 9), and add the row columns (DATEREP and CASES in line 1) and, in addition to the match_number() I have added the classifier() measure:


  1  select countriesandterritories "Country","Peak date","Peak cases","match#",daterep,cases,"classifier"
  2  from covid
  3  match_recognize (
  4   partition by continentexp, countriesandterritories order by daterep
  5   measures
  6    match_number() as "match#", classifier() as "classifier",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   all rows per match
 10   pattern (GoingUp+ GoingDown+)
 11   define
 12    GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
 14  )
 15* where countriesandterritories='Switzerland';

“all rows per match” shows all rows where pattern matching is tested, classifier() shows which primary pattern is matched.

Here are the rows around the 10th match. You must keep in mind that rows are processed in order and for each row, it looks ahead to recognize a pattern.


       Country    Peak date    Peak cases    match#      DATEREP    CASES    classifier
______________ ____________ _____________ _________ ____________ ________ _____________
...
Switzerland    24-MAR-20             1044         9 24-MAR-20        1044 GOINGUP
Switzerland    24-MAR-20             1044         9 25-MAR-20         774 GOINGDOWN
Switzerland    26-MAR-20              925        10 26-MAR-20         925 GOINGUP
Switzerland    27-MAR-20             1000        10 27-MAR-20        1000 GOINGUP
Switzerland    28-MAR-20             1390        10 28-MAR-20        1390 GOINGUP
Switzerland    28-MAR-20             1390        10 29-MAR-20        1048 GOINGDOWN
Switzerland    30-MAR-20             1122        11 30-MAR-20        1122 GOINGUP
Switzerland    31-MAR-20             1138        11 31-MAR-20        1138 GOINGUP              
Switzerland    31-MAR-20             1138        11 01-APR-20         696 GOINGDOWN  
Switzerland    02-APR-20              962        12 02-APR-20         962 GOINGUP
Switzerland    03-APR-20             1124        12 03-APR-20        1124 GOINGUP
Switzerland    03-APR-20             1124        12 04-APR-20        1033 GOINGDOWN

You see here how we came to output the 10th matched (28-MAR-20 1390 cases). After the peak of 24-MAR-20 we were going down the next day 25-MAR-20 (look at the graph). This was included in the 10th match because of regular expression “GoingDown+”. Then up 26-MAR-2020 to 28-MAR-20, which matches GoingUp+ followed by a “GoingDown” on 29-MAR-20 which means that a 11th match has been recognized. It continues for all “GoingDown+” but there’s only one here as the next one is a higher value: 1122 > 1048 so the 11th match is closed here on 29-MAR-20. This is where the ONE ROW PER MATCH is returned, when processing the row from 29-MAR-20, with the values from the last row classified as GOINGUP, and defined in the measures, which are 28-MAR-20 and 1390. And then the pattern matching continues from this row and a GoingUp has been detected…

If you want to go further, there are good examples from Lucas Jellama: https://technology.amis.nl/?s=match_recognize
And about its implementation in SQL engines, read Markus Winand https://modern-sql.com/feature/match_recognize

And I’ll probably have more blog posts here in this series about recent features interesting for BI and DWH…

Cet article Oracle 12c – peak detection with MATCH_RECOGNIZE est apparu en premier sur Blog dbi services.

↧

Oracle 12c – reorg and split table with clustering

June 9, 2020, 9:49 pm

≫ Next: Oracle 12c – global partial index

≪ Previous: Oracle 12c – peak detection with MATCH_RECOGNIZE

By Franck Pachot

.
In this series of small examples on recent features, I have imported in a previous post, the statistics of covid-19 per day and per countries. This is typical of data that comes as a time-series ordered by date, because this is how it is generated day after day, but where you probably want to query from another dimension, like per countries.

If you want to ingest data faster, you keep it in the order of arrival, and insert it in heap table blocks. If you want to optimize for the future queries on the other dimension, you may load it in a table with a specialized organization where each row has its place: an Index Organized Table, a Hash Cluster, a partitioned table, or a combination of those. With Oracle we are used to storing data without the need to reorganize it. It is a multi-purpose database. But in 12c we have many features that make this reorganization easier, like partitioning, online move and online split. We can then think about a two-phase lifecycle for some operational tables that are used later for analytics:

Fast ingest and query on short time window: we insert data on the flow, with conventional inserts, into a conventional heap table. Queries on recent data is fast as the rows are colocated as they arrived.
Optimal query on history: regularly we reorganize physically the latest ingested rows, to be clustered on another dimension, because we will query for a large time range on this other dimension

Partitioning is the way to do those operations. We can have a weekly partition for the current week. When the week is over new rows will go to a new partition (11g PARTITION BY RANGE … INTERVAL) and we can optionally merge the old partition with the one containing old data, per month or year for example, to get larger time ranges for the past data. This merge is easy (18c MERGE PARTITIONS … ONLINE). And while doing that we can reorganize rows to be clustered together. This is what I’m doing in this post.

Partitioning

From the table, I have created in the previous post I create an index on GEOID (as the goal is to query by countries) and I partition it by range on DATEREP:


SQL> create index covid_geoid on covid(geoid);

Index created.

SQL> alter table covid modify partition by range(daterep) interval (numToYMinterval(1,'year')) ( partition old values less than (date '2020-01-01') , partition new values less than (date '2021-01-01') ) online;

Table altered.

This is an online operation in 12cR2. So I have two partitions, one for “old” data and one for “new” data.

I query all dates for one specific country:


SQL> select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1
  2  /
   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

This reads rows scattered through the whole table because they were inserted day after day.

This is visible in the execution plan: the optimizer does not use the index but a full table scan:


SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                       PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________
SQL_ID  2nyu7m59d7spv, child number 0
-------------------------------------
select trunc(daterep,'mon'), max(cases) from covid where geoid='US'
group by trunc(daterep,'mon') order by 1

Plan hash value: 4091160977

-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |    55 (100)|      7 |00:00:00.01 |     180 |
|   1 |  SORT ORDER BY       |       |      1 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|   2 |   PARTITION RANGE ALL|       |      1 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|   3 |    HASH GROUP BY     |       |      2 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|*  4 |     TABLE ACCESS FULL| COVID |      2 |    105 |    53   (0)|    160 |00:00:00.01 |     180 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("GEOID"='US')

This has read 180 blocks, with multiblock reads.

I force the access by index in order to compare the cost:


SQL> select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1
  2  /

   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                                                     PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________________________
SQL_ID  2whykac7cnjks, child number 0
-------------------------------------
select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid
where geoid='US' group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |    95 (100)|      7 |00:00:00.01 |     125 |
|   1 |  SORT ORDER BY                               |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |     125 |
|   2 |   HASH GROUP BY                              |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |     125 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    105 |    93   (0)|    160 |00:00:00.01 |     125 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    105 |     1   (0)|    160 |00:00:00.01 |       2 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

Even if the number of blocks is a bit smaller, 125 blocks, they are single block reads and then the cost is higher: 95 for index access when the full table scan was 55. Using hints and comparing the cost is how I often try to understand the optimizer choice and here the reason is clear: because rows are scattered, the clustering factor of the index access is really bad.

I said that I want to merge the partitions. And maybe reorg with an online table move. But now, for this second phase of the lifecycle, I want to cluster rows on the country dimension rather than on arrival date.

Attribute clustering

This preference can be declared on the table with 12c Attribute Clustering:


SQL> alter table covid add clustering by linear order (continentexp, countriesandterritories);

Table altered.

You see that I can mention multiple columns and I don’t need to use the GEOID column that I will use to query. This is not an index. This just a preference to cluster rows and, if they are clustered on the country name, they will be also clustered on continent, country code, geoid,… I have chosen those columns for clarity when reading the DDL:


SQL> exec dbms_metadata.set_transform_param(DBMS_METADATA.SESSION_TRANSFORM,'SEGMENT_ATTRIBUTES',false);

PL/SQL procedure successfully completed.

SQL> ddl covid

  CREATE TABLE "COVID"
   (    "DATEREP" DATE,
        "N_DAY" NUMBER,
        "N_MONTH" NUMBER,
        "N_YEAR" NUMBER,
        "CASES" NUMBER,
        "DEATHS" NUMBER,
        "COUNTRIESANDTERRITORIES" VARCHAR2(50),
        "GEOID" VARCHAR2(10),
        "COUNTRYTERRITORYCODE" VARCHAR2(3),
        "POPDATA2018" NUMBER,
        "CONTINENTEXP" VARCHAR2(10)
   )
 CLUSTERING
 BY LINEAR ORDER ("COVID"."CONTINENTEXP",
  "COVID"."COUNTRIESANDTERRITORIES")
   YES ON LOAD  YES ON DATA MOVEMENT
 WITHOUT MATERIALIZED ZONEMAP
  PARTITION BY RANGE ("DATEREP") INTERVAL (NUMTOYMINTERVAL(1,'YEAR'))
 (PARTITION "OLD"  VALUES LESS THAN (TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
 PARTITION "NEW"  VALUES LESS THAN (TO_DATE(' 2021-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ) ;

  CREATE INDEX "COVID_GEOID" ON "COVID" ("GEOID")
  ;

As you can see the default is YES for ON LOAD which means that direct-path inserts will cluster rows, and ON DATA MOVEMENT is also YES which is why merging partitions will also cluster rows.

I’ve done that afterward here but this is something you can do at table creation. You mention on which attributes you want to cluster. You mention when: direct-path inserts (YES ON LOAD) and/or table reorganization (YES ON DATA MOVEMENT). This is defined at table level. Beyond those defaults, the table reorganizations (ALTER TABLE … MOVE, ALTER TABLE … MERGE PARTITIONS) can explicitly DISALLOW CLUSTERING or ALLOW CLUSTERING.

Move Partition

When I have ingested some data and think that it would be better to cluster them, maybe at the time this partition is completed and new inserts go to a higher interval, I can reorganize it with a simple ALTER TABLE … MOVE:


SQL> alter table covid move partition new online allow clustering;

Table altered.

This will cluster rows together on the clustering attributes. I mentioned ALLOW CLUSTERING to show the syntax but it is the default (YES ON DATA MOVEMENT) anyway here.

At that point, you may also want to compress the old partitions with basic compression (the compression that does not require an additional option but is possible only with bulk load or data movement). However, be careful: the combination of online operation and basic compression requires the Advanced Compression Option. More info in a previous post on “Segment Maintenance Online Compress” feature usage.

Merge Partition

As my goal is to cluster data on a different dimension than the time one, I may want to have larger partitions for the past ones. Something like the current partition holding a week of data at maximum, but the past partitions being on quarter or yearly ranges. That can be done with partition merging, which is an online operation in 18c (and note that I have a global index here and an online operation does not invalidate indexes):


SQL> alter table covid merge partitions old,new into partition oldmerged online allow clustering;

Table altered.

This is a row movement and clustering on data movement is enabled. Again I mentioned ALLOW CLUSTERING just to show the syntax.

Let’s see the number of buffers read now with index accesss. The statistics of the index (clustering factor) has not been updated, so the optimizer may not choose the index access yet (until dbms_stats runs on stale tables). I’m forcing with an hint:


SQL> select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                                                     PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________________________
SQL_ID  2whykac7cnjks, child number 0
-------------------------------------
select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid
where geoid='US' group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |    95 (100)|      7 |00:00:00.01 |       8 |
|   1 |  SORT ORDER BY                               |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |       8 |
|   2 |   HASH GROUP BY                              |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |       8 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    105 |    93   (0)|    160 |00:00:00.01 |       8 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    105 |     1   (0)|    160 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')
       filter(TBL$OR$IDX$PART$NUM(,0,8,0,"COVID".ROWID)=1)

The cost has not changed (because of the statistics) but the number of buffers read is minimal: only the 8 buffers where all my rows for this country are clustered. Remember that I clustered on the country name but use the GEOID here in my predicate. That doesn’t matter as long as the rows are together.

Asynchronous global index maintenance

Note the strange predicate on TBL$OR$IDX$PART$NUM(,0,8,0,”COVID”.ROWID)=1 that results from another 12c feature where global indexes are maintained usable during the partition maintenance (which is required for an online operation) but optimized to be cleaned-out asynchronously later. This is visible from DBA_INDEXES:


SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:33:34                        19206 YES

Orphaned entries mean that some entries in the global index may reference the dropped segment after my MOVE or MERGE and the query has to ignore them.

Those ranges of rowid are determined from the segment concerned, stored in the dictionary:


SQL> select * from sys.index_orphaned_entry$;
   INDEXOBJ#    TABPARTDOBJ#    HIDDEN
____________ _______________ _________
       79972           79970 O
       79972           79971 O
       79972           79980 O
       79972           79973 O

HIDDEN=’O’ means Orphaned and the ROWIDs addressing these partitions are filtered out from the dirty index entries buy the predicated filter(TBL$OR$IDX$PART$NUM(,0,8,0,”COVID”.ROWID)=1) above.

This maintenance of the dirty index will be done during the maintenance window but I can do it immediately to finish my reorganization correctly:


SQL> alter index COVID_GEOID coalesce cleanup;

Index altered.

SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:33:34                        19206 NO

No orphaned index entries anymore. Note that I could also have called the DBMS_PART.CLEANUP_GIDX procedure to do the same.

This is fine for the query, but as the statistics were not updated, the optimizer doesn’t know yet how clustered is my table. In order to complete my reorganization and have queries benefiting from this immediately, I gather the statistics:


SQL> exec dbms_stats.gather_table_stats(user,'COVID',options=>'gather auto');

PL/SQL procedure successfully completed.

SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:38:40                          369 NO

GATHER AUTO gathers only the stale ones, and, as soon as I did my MOVE or MERGE, the index was marked as stale (note that the ALTER INDEX COALESCE does not mark them a stale by itself).

And now my query will use this optimal index without the need for any hint:


SQL_ID  2nyu7m59d7spv, child number 0
-------------------------------------
select trunc(daterep,'mon'), max(cases) from covid where geoid='US'
group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |     7 (100)|      7 |00:00:00.01 |       5 |
|   1 |  SORT ORDER BY                               |             |      1 |    101 |     7  (29)|      7 |00:00:00.01 |       5 |
|   2 |   HASH GROUP BY                              |             |      1 |    101 |     7  (29)|      7 |00:00:00.01 |       5 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    160 |     5   (0)|    160 |00:00:00.01 |       5 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    160 |     2   (0)|    160 |00:00:00.01 |       2 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

and, thanks to the coalesce cleanup, there’s no predicate on orphan ROWIDs anymore.

With this pattern, you may realize that my global index on countries is useful only for past data. Not for the recent one that has not been clustered yet. Then, we can even avoid maintaining the index for this partition. We will see that in the next post. it is called partial indexing.

With this pattern, we can even doubt about the need to maintain an index for the old partitions. As all my rows for GEOID=’US’ were packed in a few contiguous blocks, why not just store the range of ROWIDs rather than the list of it? This is called Zone Maps. But this is only available on Exadata and I like to think about Oracle as a multiplatform database.

Those many features came in the recent releases thanks to the development of the Autonomous Database. When the DBA is a cloud provider, whether it is automated or not, all maintenance must be done online without stopping the application. Those features are the bricks to build automatic lifecycle management and performance optimization.

Cet article Oracle 12c – reorg and split table with clustering est apparu en premier sur Blog dbi services.

↧

Oracle 12c – global partial index

June 9, 2020, 10:02 pm

≫ Next: Control-M/EM: Cyclic job interval less than 1 min

≪ Previous: Oracle 12c – reorg and split table with clustering

By Franck Pachot

.
We have an incredible number of possibilities with Oracle. Yes, an index can be global (indexing many partitions without having to be partitioned itself on the same key) and partial (skipping some of the table partitions where we don’t need indexing). In the previous post of this series of small examples on recent features I partitioned a table, with covid-19 cases per day and per country, partitioned on range of date by interval. The index on the country code (GEOID) was not very efficient for data ingested per day, because countries are scattered through all the table. And then I have reorganized the old partitions to cluster them on countries.

My global index on country code is defined as:


SQL> create index covid_geoid on covid(geoid);

Index created.

This is efficient, thanks to clustering, except for the new rows coming again in time order. As those go to a new partition that is small (the idea in the post was to have short time range for the current partition, and larger ones for the old, using the ALTER TABLE … MERGE ONLINE to merge the newly old one to the others). For the current partition only, it is preferable to full scan this last partition. And even avoid maintaining the index entries for this partition as this will accelerate data ingestion.

I think that partial indexing is well known for local indexes, as this is like marking some index partitions as unusable. But here I’m showing it on a global index.

Splitting partitions

In order to continue from the previous previous post where I merged all partitions, I’ll split them again, and this can be an online operation in 12cR2:


SQL> alter table covid split partition oldmerged at (date '2020-04-01') into (partition old, partition new) online;

Table altered.

SQL> alter index COVID_GEOID coalesce cleanup;

Index altered.

I have two partitions, “old” and “new”, and a global index. I also cleaned up the orphaned index entries to get clean execution plans. And it has to be done anyway.

Here is my query, using the index:


SQL> explain plan for select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                              PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________________________
Plan hash value: 2816502185

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                               |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                              |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |   160 |  2400 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

This goes to all partitions, as the ROWID in a global index carries the partition information through the data object id. We see that with Pstart/Pstop=ROWID.

Partial indexing

Now I want to set my global index on countries to be a partial index:


SQL> alter index covid_geoid indexing partial;

Index altered.

This doesnt change anything for the moment. The indexing of partitions will depend on the partition attributes which is by default INDEXING ON.

I set the “new” partition to not maintain indexes (INDEXING OFF), for this partition only.


SQL> alter table covid modify partition new indexing off;

Table altered.

This means that partial indexes will not reference the “new” partition. Whether they are local (which then means no index partition) or global (which then means no index entries for this partition).

And that’s all. Now there will be no overhead in maintaining this index when ingesting new data in this partition.

Table Expansion

And then, the optimizer has a transformation to split the execution plan in two branches: one for the index access and one without. This transformation was introduced in 11g for unusable local partitions and is now used even with global indexes. :


SQL> explain plan for /*+ index(covid) */ select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________________
Plan hash value: 1031592504

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                                 |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                                |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   3 |    VIEW                                        | VW_TE_2     |   321 |  7062 |    35   (0)| 00:00:01 |       |       |
|   4 |     UNION-ALL                                  |             |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |    93 |  1395 |     4   (0)| 00:00:01 |     1 |     1 |
|*  6 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE SINGLE                    |             |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|*  8 |       TABLE ACCESS FULL                        | COVID       |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|   9 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |   160 |  4320 |     4   (0)| 00:00:01 | ROWID | ROWID |
|* 10 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("COVID"."DATEREP"=TO_DATE(' 2020-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "COVID"."DATEREP"<TO_DATE(' 2021-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  10 - access("GEOID"='US')
       filter(TBL$OR$IDX$PART$NUM("COVID",0,8,0,ROWID)=1 AND TBL$OR$IDX$PART$NUM("COVID",0,0,65535,ROWID)1 AND
              TBL$OR$IDX$PART$NUM("COVID",0,0,65535,ROWID)2)

The TABLE ACCESS BY GLOBAL INDEX ROWID is for partition 1 as mentioned by Pstart/Pstop, which is the “old” one with INDEXING ON. The TABLE ACCESS FULL is for partition 2, the “new” one, that has INDEXING OFF. The optimizer uses predicates on the partition key to select the branch safely.

But this plan has also an additional branch and this TBL$OR$IDX$PART$NUM again because I have interval partitioning. With interval partitioning, there is no known Pstop, it then it has handle the cases where a new partition has been created (with indexing on). Then, the third branch can access by index ROWID for the partitions that are not hardcoded in this plan.

Let’s remove interval partitioning just to get the plan easier to read:


SQL> alter table covid set interval();

Table altered.


SQL> explain plan for select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________________
Plan hash value: 3529087922

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                                 |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                                |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   3 |    VIEW                                        | VW_TE_2     |   161 |  3542 |    33   (0)| 00:00:01 |       |       |
|   4 |     UNION-ALL                                  |             |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |    93 |  1395 |     6   (0)| 00:00:01 |     1 |     1 |
|*  6 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE SINGLE                    |             |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|*  8 |       TABLE ACCESS FULL                        | COVID       |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("COVID"."DATEREP"<TO_DATE(' 2020-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   6 - access("GEOID"='US')
   8 - filter("GEOID"='US')

Here it is clear: access by index to the partition 1 and full table scan for partition 2. This is exactly what I wanted because I know the clustering factor on the new partition is not very good until I reorganize it (move or merge as I did in the previous post).

All these features help to manage the lifecycle of data. That’s a completely different approach from purpose-built databases where you have one database service for fast ingest with simple queries on recent data (NoSQL folks may think about DynamoDB for that), then streaming data to a relational database for more OLTP queries (RDS to continue with the AWS analogy), and move old data into a database dedicated to analytics (that could be Redshift then). With Oracle, which has always been a multi-purpose database, the goal is to avoid duplication and replication and manage data in-place for all usage. Through the 40 years of this database engine, many approaches have been implemented to cluster data: CLUSTER and IOT can sort (or hash) data as soon as it is inserted, in order to put them at their optimal place for future queries. But the agility of heap tables finally wins. Now, with the ease of in-database data movement (partitioning and online operations) and improvement of full scan (multiblock reads, direct-path reads, storage indexes) we can get the best of both: heap tables with few indexes for fast ingest of current data, reorganize regularly to be clustered, with additional indexes.

I mentioned NoSQL and I mentioned fast ingest. Actually, there’s a feature called Fast Ingest for IoT (lowercase ‘o’ there) that goes further with this idea. Instead of inserting into a persistent segment and reorganize later, rows are buffered in a ‘memoptimized rowstore’ before going to the heap segment in bulk. But that’s an Exadata feature and I like to think about Oracle as a multiplatform database.

Cet article Oracle 12c – global partial index est apparu en premier sur Blog dbi services.

↧

Control-M/EM: Cyclic job interval less than 1 min

June 11, 2020, 6:13 am

≫ Next: Control-M/EM :Put a job already ordered in dummy job

≪ Previous: Oracle 12c – global partial index

Introduction

By default, the minimal execution’s interval of a cyclic job is 1 minute (with the rerun every 0 sec option it will loop instantaneously).

But what happens when you need to rerun it in a interval between 0 and 1 minute?

Let’s use a little trick to achieve that:

1)Job configuration

In planning pane create a job
When selecting this job, go in “scheduling” tab part and tick the “cyclic” box:

Result:

As soon as you define your job as cyclic it will take the default configured value and rerun every (1) minutes from job’s start

We will update this part by switching to 0 minute(s) from Job’s start (be careful about using cyclic job to 0 )

We also will keep the maximum rerun at 0 for our test:

2)Using sleep command and Control-M/Agent _sleep utility

Depending of the OS where agent is installed, you can use these two commands:

a) For Windows OS :

you have to use the Control-M/Agent _sleep utility (time in sec):

Definition from Control-M Help ( available on your Control-M workload automation client opening file/help/view help ):

b) For Unix/Linux OS:

the sleep command with time needed in second

[ctmag900@CTMSRVCENTOS ~]$ sleep 30

[ctmag900@CTMSRVCENTOS ~]$¨

As we are on a Linux machine, we will use the sleep built-in shell function from UNIX.

We will add it to “post-execution” or “pre-execution” command part :

(Note: On older Control-M version (before 8.0.00) it was named pre-cmd and post-cmd)

In our example we have scheduled the job to be executed every 30 sec by using the below syntax:

sleep 30

3)Order job and check log

Having a look on the submission intervals, it appears that we have the expected result ,with job executing every 30 sec:

Note:

You can have a slight delay depending of the response time of your machine where the task is running

Conclusion:

You are now able to perform cyclic jobs in less than 1 minute intervals by using ths tip.

To get more information, you can consult BMC’s site and of course don’t forget to check dbi’s blogger for more tips and tricks!

Cet article Control-M/EM: Cyclic job interval less than 1 min est apparu en premier sur Blog dbi services.

↧

Control-M/EM :Put a job already ordered in dummy job

June 11, 2020, 6:14 am

≫ Next: No{Join,GroupBy}SQL – Analytic Views for BI

≪ Previous: Control-M/EM: Cyclic job interval less than 1 min

Hi everybody.

Today we will see how to put a job in dummy mode on the monitoring part.

Introduction:

In some case you want, or you will be asked to bypass a treatment. For that, you can put the job in dummy mode. But how can we proceed if this request is needed for a job already ordered and ready to be executed?

Example:

You made a manual backup and the same task is usually scheduled in a job
This job is part of a workflow and depending of other jobs
You want to keep this workflow processing.

If you want to avoid job’s execution of one or more jobs, you may have many choices:

1.Update the job in job definition (configure it in dummy mode in the planning pane) then order it again in monitoring part that implies you have to substitute the previous one and be careful to give it the same conditions to fit to the workflow)

Quite a touchy action as you must link the dummy job to the workflow (keeping all conditions ) and delete the old job that you want to substitute.

2.Delete the job,that implies you must review conditions of the workflow

3.Update the job in monitoring part with the run now options. (which is much better and the one we will choose)

4.Other tricky modifications and workarounds( such as substitute the command line,example: echo “backup ever done today”). The aim is only to get the job ending OK without any incidence in the workflow.

Workshop :

We will use the 3dr method Update the job in monitoring part with the run now options

Note that it will be available only for the current day

Let’s take an example:

Below you have a workflow containing a job performing a save.

When the workflow is performed as usual,we can check that the saving script is executing , as you can see in the output:

Then once finished, this job will send the conditions to the next job.

To bypass this backup action performed in the job, we will use the jobs options:

-Right click on the job and select “run now” then “Select Bypass Options”

-Tick the box Run as dummy jobs in the additional bypass options

-Once done , you can verify that the job is set as a dummy job by checking its log:

Now let see if the workflow will process and if the saving job will execute or not:

As we can see the job had the same behavior of a “native” dummy job ( no execution, just giving a status OK and send conditions )

Consequently , when you try to check the log you get this message explaining that no output is available :

Which is expected, as a dummy job has no output.

Note that for this case ,you can have some elements to take in account to manage your workflow:

As you know , the dummy job gives his status OK quickly , so you must take in account that your job may have its running time that impact the workflow ending time .Following our example, if the backup job has an average execution time of 3 hours, then if you put it in dummy,the next depending job will sent his report 3hours earlier,as dummy job has instantaneous execution.

So, everything depends on your strategy and when you expect the workflow to finish

Important:

You have noticed that when you put a job in dummy using this way you have no GHOST icon, you must check the job’s log to see is the job was set in dummy during the day.

Contrary to dummy update through planning pane:

When the job is ordered after having checked off the run as dummy box , you will have on monitoring the ghost icon ( but we have to be aware that this modification is not only for the day but it is definitive and is not satisfying the first request which was to dummy the job only for the day )

Conclusion:

Using run now option on a job is a quick and safe way to put job in dummy mode for the current day, keep also in mind that execution time may make your depending jobs executing earlier.

In case you want to put the job in dummy for many days, you must update it in the planning pane.

Once again, I invite you to consult my other posts and my colleagues’ blogs.

You can also get more tips and tricks following the BMC support site!

Cet article Control-M/EM :Put a job already ordered in dummy job est apparu en premier sur Blog dbi services.

↧

No{Join,GroupBy}SQL – Analytic Views for BI

June 14, 2020, 11:36 am

≫ Next: Documentum Upgrade – Missing DARs after upgrade

≪ Previous: Control-M/EM :Put a job already ordered in dummy job

By Franck Pachot

.
Advocates of NoSQL can query their structures without having to read a data model first. And without writing long table join clauses. They store and query a hierarchical structure without the need to follow relationships, and without the need to join tables on a foreign key name, in order to get a caption or description from a lookup table. The structure, like an XML or JSON document, provides metadata to understand the structure and map it to business objects. The API is simple ‘put’ and ‘get’ where you can retrieve a whole hierarchy, with aggregates at all levels, ready to drill-down from summary to details. Without the need to write sum() functions and group by clauses. For analytics, SQL has improved a lot with window functions and grouping sets but, despite being powerful, this makes the API more complex. And, at a time were the acceptable learning curve should reach its highest point after 42 seconds (like watching the first bits of a video or getting to the stackoverflow top-voted answer), this complexity cannot be adopted easily.

Is SQL too complex? If it does, then something is wrong. SQL was invented for end-users: to query data like in plain English, without the need to know the internal implementation and the procedural algorithms that can make sense out of it. If developers are moving to NoSQL because of the complexity of SQL, then SQL missed something from its initial goal. If they go to NoSQL because “joins are expensive” it just means that joins should not be exposed to them. Because optimizing access paths and expensive operations is the job of the database optimizer, with the help of the database designer, but not the front-end developer. However, this complexity is unfortunately there. Today, without a good understanding of the data model (entities, relationships, cardinalities) writing SQL queries is difficult. Joining over many-to-many relationships, or missing a group by clause, can give wrong results. When I see a select with a DISTINCT keyword, I immediately think that there’s an error in the query and the developer, not being certain of the aggregation level he is working on, has masked it with a DISTINCT because understanding the data model was too time-consuming.

In data warehouses, where the database is queried by the end-user, we try to avoid this risk by building simple star schemas with only one fact tables and many-to-one relationships to dimensions. And on top of that, we provide a reporting tool that will generate the queries correctly so that the end-user does not need to define the joins and aggregations. This requires a layer of metadata on top of the database to describe the possible joins, aggregation levels, functions to aggregate measures,… When I was a junior on databases I’ve been fascinated by those tools. On my first Data Warehouse, I’ve built a BusinessObjects (v3) universe. It was so simple: define the “business objects”, which are the attributes mapped to the dimension columns. Define the fact measures, with the aggregation functions that can apply. And for the joins, it was like the aliases in the from clause, a dimension having multiple roles: think about an airport that can be the destination or the origin of a flight. And then we defined multiple objects: all the airport attributes in the destination role, and all the airport attributes as an origin, were different objects for the end-user. Like “origin airport latitude”, rather than “airport latitude” that makes sense only after a join on “origin airport ID”. That simplifies a lot the end-user view on our data: tables are still stored as relational tables to be joined at query time, in order to avoid redundancy, but the view on top of that shows the multiple hierarchies, like in a NoSQL structure, for the ease of simple queries.

But, as I mentioned, this is the main reason for SQL and this should be done with SQL. All these descriptions I did in the BusinessObjects universe should belong to the database dictionary. And that’s finally possible with Analytic Views. Here is an example on the tables I’ve created in a previous post. I am running on the 20c cloud preview, but this can run on 18c or 19c. After importing the .csv of covid-19 cases per day and countries, I’ve built one fact and one snowflake-dimension tables:


create table continents as select rownum continent_id, continentexp continent_name from (select distinct continentexp from covid where continentexp!='Other');
create table countries as select country_id,country_code,country_name,continent_id,popdata2018 from (select distinct geoid country_id,countryterritorycode country_code,countriesandterritories country_name,continentexp continent_name,popdata2018 from covid where continentexp!='Other') left join continents using(continent_name);
create table cases as select daterep, geoid country_id,cases from covid where continentexp!='Other';
alter table continents add primary key (continent_id);
alter table countries add foreign key (continent_id) references continents;
alter table countries add primary key (country_id);
alter table cases add foreign key (country_id) references countries;
alter table cases add primary key (country_id,daterep);

The dimension hierarchy is on country/continent. I should have created one for time (day/month/quarter/year) but the goal is to keep it simple to show the concept.

When looking at the syntax, it may seem complex. But, please, understand that the goal is to put more in the static definition so that runime usage is easier.

Attribute Dimension

I’ll describe the Country/Continent dimension. It can be in one table (Star Schema) or multiple (Snowflake Schema). I opted for snowflake to show how it is supported since 18c. In 12c we have to create a view on it as the using clause can be only a table or view identifier.


create or replace attribute dimension COUNTRIES_DIM_ATT
using COUNTRIES a ,CONTINENTS b join path country_continent on a.CONTINENT_ID=b.CONTINENT_ID
attributes ( a.COUNTRY_ID "Country ID", a.COUNTRY_CODE "Country", a.COUNTRY_NAME "Country name", a.CONTINENT_ID "Continent ID", b.CONTINENT_NAME "Continent")
level "Continent"
  key "Continent ID"
  member name         '#'||to_char("Continent ID")
  member caption      upper(substr("Continent",1,3))
  member description  "Continent"
  determines ("Continent")
level "Country"
  key "Country ID"
  member name         "Country ID"
  member caption      "Country"
  member description  "Country name"
  determines ("Country ID","Country", "Country name", "Continent ID", "Continent")
 all member name 'WORLD'
/

Let’s take it simply, I have an internal name for my dimension COUNTRIES_DIM_ATT and a USING clause which declares the dimension table and an optional join for snowflake schemas with JOIN PATH. Then I’ve declared the attributes which are the projection of those columns. For this example, I decided to use quoted identifiers for the one that I add in this layer, to distinguish them from the table columns. But do as you want.

The most important here is about levels and dependency. In a star schema, we denormalize the fact tables for simplification (and because it is not a problem as there are no updates, and size is not as large as the fact tables). The metadata we declare here describes the relationships. I have two levels: country and continent. And a many-to-one relationship from country to continent. This is what I declare with the LEVEL and DETERMINES keyword: from all the attributes declared, which ones are functional dependencies of others.

The second important description here is standard naming. In the analytic view, I can query the attributes as columns from the USING clause. But for the ease of querying by simple tools, they will also have standard columns names. Each attribute has as MEMBER NAME (I used the 2-letter country code here which is the COUNTRY_ID primary key in my COUNTRIES dimension table. They have a MEMBER CAPTION as a short name and a MEMBER DESCRIPTION for a longer one. Those are standardized names for each object. The idea is to provide a view that can be used without reading the data model: for each level, the end-user can query the name, caption or the description.

The idea is that those hierarchy levels will be selected in the WHERE clause by a LEVEL_NAME instead of mentioning all columns in GROUP BY clause or PARTITION BY analytic function windowing clause. Note that the’s also an ALL level for the top-most aggregation and we can keep the ‘ALL’ name or a specific one like the ‘WORLD’ I’ve defined here for all countries.

This is the most important metadata is defined by the dimension but we don’t query on dimensions. We can only look at the definitions in the dictionary:


SQL> select * FROM user_attribute_dimensions;

      DIMENSION_NAME    DIMENSION_TYPE    CACHE_STAR    MAT_TABLE_OWNER    MAT_TABLE_NAME    ALL_MEMBER_NAME    ALL_MEMBER_CAPTION    ALL_MEMBER_DESCRIPTION    COMPILE_STATE    ORIGIN_CON_ID
____________________ _________________ _____________ __________________ _________________ __________________ _____________________ _________________________ ________________ ________________
COUNTRIES_DIM_ATT    STANDARD          NONE                                               'WORLD'                                                            VALID                           3
CALENDAR_DIM_ATT     STANDARD          NONE                                               'ALL'                                                              VALID                           3
DAYS_DIM_ATT         TIME              NONE                                               'ALL'                                                              VALID                           3

SQL> select * FROM user_attribute_dim_attrs;

      DIMENSION_NAME    ATTRIBUTE_NAME    TABLE_ALIAS       COLUMN_NAME    ORDER_NUM    ORIGIN_CON_ID
____________________ _________________ ______________ _________________ ____________ ________________
DAYS_DIM_ATT         Date              CASES          DATEREP                      0                3
COUNTRIES_DIM_ATT    Country ID        A              COUNTRY_ID                   0                3
COUNTRIES_DIM_ATT    Country           A              COUNTRY_CODE                 1                3
COUNTRIES_DIM_ATT    Country name      A              COUNTRY_NAME                 2                3
COUNTRIES_DIM_ATT    Continent ID      A              CONTINENT_ID                 3                3
COUNTRIES_DIM_ATT    Continent         B              CONTINENT_NAME               4                3
CALENDAR_DIM_ATT     Date              CASES          DATEREP                      0                3

SQL> select * FROM user_attribute_dim_levels;

      DIMENSION_NAME    LEVEL_NAME    SKIP_WHEN_NULL    LEVEL_TYPE                MEMBER_NAME_EXPR               MEMBER_CAPTION_EXPR    MEMBER_DESCRIPTION_EXPR    ORDER_NUM    ORIGIN_CON_ID
____________________ _____________ _________________ _____________ _______________________________ _________________________________ __________________________ ____________ ________________
COUNTRIES_DIM_ATT    Continent     N                 STANDARD      '#'||to_char("Continent ID")    upper(substr("Continent",1,3))    "Continent"                           0                3
DAYS_DIM_ATT         Day           N                 DAYS          TO_CHAR("Date")                                                                                         0                3
COUNTRIES_DIM_ATT    Country       N                 STANDARD      "Country ID"                    "Country"                         "Country name"                        1                3
CALENDAR_DIM_ATT     Day           N                 STANDARD      TO_CHAR("Date")                                                                                         0                3

There are more that we can define here. I the same way we want to simplify the PARTITION BY clause of analytic function, thanks to levels, we avoid the ORDER BY clause with ordering in each level. I keep it simple here.

For drill-down analytics, we query on hierarchies.

Hierarchy

This is a simple declaration of parent-child relationship between levels:


SQL> 
create or replace hierarchy "Countries"
    using COUNTRIES_DIM_ATT
    ( "Country" child of "Continent")
 /

Hierarchy created.

This is actually a view that we can query, and the best way to understand it is to look at it.

The definition from the dictionary just reflects what we have created:


SQL> select * FROM user_hierarchies;

   HIER_NAME    DIMENSION_OWNER       DIMENSION_NAME    PARENT_ATTR    COMPILE_STATE    ORIGIN_CON_ID
____________ __________________ ____________________ ______________ ________________ ________________
Countries    DEMO               COUNTRIES_DIM_ATT                   VALID                           3

SQL> select * FROM user_hier_levels;

   HIER_NAME    LEVEL_NAME    ORDER_NUM    ORIGIN_CON_ID
____________ _____________ ____________ ________________
Countries    Continent                0                3
Countries    Country                  1                3

We can also query USER_HIER_COLUMNS to see what is exposed as a view.

but a simple DESC will show them:


SQL> desc "Countries"

                 Name    Role            Type
_____________________ _______ _______________
Country ID            KEY     VARCHAR2(10)
Country               PROP    VARCHAR2(3)
Country name          PROP    VARCHAR2(50)
Continent ID          KEY     NUMBER
Continent             PROP    VARCHAR2(10)
MEMBER_NAME           HIER    VARCHAR2(41)
MEMBER_UNIQUE_NAME    HIER    VARCHAR2(95)
MEMBER_CAPTION        HIER    VARCHAR2(12)
MEMBER_DESCRIPTION    HIER    VARCHAR2(50)
LEVEL_NAME            HIER    VARCHAR2(9)
HIER_ORDER            HIER    NUMBER
DEPTH                 HIER    NUMBER(10)
IS_LEAF               HIER    NUMBER
PARENT_LEVEL_NAME     HIER    VARCHAR2(9)
PARENT_UNIQUE_NAME    HIER    VARCHAR2(95)

This is like a join on the COUNTRIES and CONTINENTS (defined in the using clause of the attribute dimension) with the attributes exposed. But there are also additional columns that are there with standard names in all hierarchies: member name/caption/description and level information. Because all levels are here, as if we did some UNION ALL over GROUP BY queries.

Additional columns and additional rows for each level. Let’s query it:


SQL> select * from "Countries";

   Country ID    Country                         Country name    Continent ID    Continent    MEMBER_NAME    MEMBER_UNIQUE_NAME    MEMBER_CAPTION                   MEMBER_DESCRIPTION    LEVEL_NAME    HIER_ORDER    DEPTH    IS_LEAF    PARENT_LEVEL_NAME    PARENT_UNIQUE_NAME
_____________ __________ ____________________________________ _______________ ____________ ______________ _____________________ _________________ ____________________________________ _____________ _____________ ________ __________ ____________________ _____________________
                                                                                           WORLD          [ALL].[WORLD]                                                                ALL                       0        0          0
                                                                            1 Asia         #1             [Continent].&[1]      ASI               Asia                                 Continent                 1        1          0 ALL                  [ALL].[WORLD]
AE            ARE        United_Arab_Emirates                               1 Asia         AE             [Country].&[AE]       ARE               United_Arab_Emirates                 Country                   2        2          1 Continent            [Continent].&[1]
AF            AFG        Afghanistan                                        1 Asia         AF             [Country].&[AF]       AFG               Afghanistan                          Country                   3        2          1 Continent            [Continent].&[1]
BD            BGD        Bangladesh                                         1 Asia         BD             [Country].&[BD]       BGD               Bangladesh                           Country                   4        2          1 Continent            [Continent].&[1]
...
VN            VNM        Vietnam                                            1 Asia         VN             [Country].&[VN]       VNM               Vietnam                              Country                  43        2          1 Continent            [Continent].&[1]
YE            YEM        Yemen                                              1 Asia         YE             [Country].&[YE]       YEM               Yemen                                Country                  44        2          1 Continent            [Continent].&[1]
                                                                            2 Africa       #2             [Continent].&[2]      AFR               Africa                               Continent                45        1          0 ALL                  [ALL].[WORLD]
AO            AGO        Angola                                             2 Africa       AO             [Country].&[AO]       AGO               Angola                               Country                  46        2          1 Continent            [Continent].&[2]
BF            BFA        Burkina_Faso                                       2 Africa       BF             [Country].&[BF]       BFA               Burkina_Faso                         Country                  47        2          1 Continent            [Continent].&[2]
...

I’ve removed many rows for clarity, but there is one row for all countries, the deepest level, plus one row for each continent, plus one row for the top summary (‘WORLD’). This is how we avoid GROUP BY in the end-user query: we just mention the level: LEVEL_NAME=’ALL’, LEVEL_NAME=’Continent’, LEVEL_NAME=’Country’. Or query the DEPTH: 0 for the global summary, 1 for continents, 2 for countries. The countries, being the most detailed level can also be queried by IS_LEAF=1. The attributes may be NULL for non-leaf levels, like “Country name” when at ‘Continent’ level, or “Continent” when at ‘ALL’ level.

In addition to the attributes, we have the standardized names, so that the user GUI can see the same column names for all dimensions. I don’t show all countries and I don’t query MEMBER_NAME and MEMBER_CAPTION to get it short here:


SQL>
select MEMBER_NAME,MEMBER_UNIQUE_NAME,LEVEL_NAME,PARENT_LEVEL_NAME,PARENT_UNIQUE_NAME,HIER_ORDER,DEPTH,IS_LEAF
 from "Countries" order by DEPTH,HIER_ORDER fetch first 10 rows only;

   MEMBER_NAME    MEMBER_UNIQUE_NAME    LEVEL_NAME    PARENT_LEVEL_NAME    PARENT_UNIQUE_NAME    HIER_ORDER    DEPTH    IS_LEAF
______________ _____________________ _____________ ____________________ _____________________ _____________ ________ __________
WORLD          [ALL].[WORLD]         ALL                                                                  0        0          0
#1             [Continent].&[1]      Continent     ALL                  [ALL].[WORLD]                     1        1          0
#2             [Continent].&[2]      Continent     ALL                  [ALL].[WORLD]                    45        1          0
#3             [Continent].&[3]      Continent     ALL                  [ALL].[WORLD]                   101        1          0
#4             [Continent].&[4]      Continent     ALL                  [ALL].[WORLD]                   156        1          0
#5             [Continent].&[5]      Continent     ALL                  [ALL].[WORLD]                   165        1          0
AE             [Country].&[AE]       Country       Continent            [Continent].&[1]                  2        2          1
AF             [Country].&[AF]       Country       Continent            [Continent].&[1]                  3        2          1
BD             [Country].&[BD]       Country       Continent            [Continent].&[1]                  4        2          1
BH             [Country].&[BH]       Country       Continent            [Continent].&[1]                  5        2          1

A row can be identified by the level (LEVEL_NAME or DEPTH) and its name but a unique name is generated here with the full path (in MDX style). This is MEMBER_UNIQUE_NAME and we have also the PARENT_UNIQUE_NAME if we want to follow the hierarchy.

Analytic View

Now that I have a view on the hierarchy, I want to join it to the fact table, in order to display the measures at different levels of aggregation. Again, I don’t want the user to think about joins and aggregation functions, and this must be encapsulated in a view, an ANALYTIC VIEW:


create or replace analytic view "COVID cases"
using CASES
dimension by (
  COUNTRIES_DIM_ATT key COUNTRY_ID references "Country ID"
  hierarchies ( "Countries")
 )
measures (
  "Cases"          fact CASES aggregate by sum,
  "Highest cases"  fact CASES aggregate by max
)
/

The USING clause just mentions the fact table. The DIMENSION clause lists all the dimensions (I have only one here for the simplicity of the example, but you will have all dimensions here) and how they join to the dimension (foreign key REFERENCES the lowest level key of the dimension). The MEASURES defines the fact columns and the aggregation function to apply to them. This can be complex to be sure it always makes sense. What is stored in one fact column can be exposed as multiple business objects attribute depending on the aggregation.

There are many functions for measures calculated. For example in the screenshot you will see at the end, I added the following to show the country covid cases as a ration on their continent ones.


 "cases/continent" as 
  ( share_of("Cases" hierarchy COUNTRIES_DIM_ATT."Countries"  level "Continent") )
  caption 'Cases Share of Continent' description 'Cases Share of Continent'

But for the moment I keep it simple with only “Cases” and “Highest cases”.

Here is the description:


SQL> desc "COVID cases"

            Dim Name    Hier Name                  Name    Role            Type
____________________ ____________ _____________________ _______ _______________
COUNTRIES_DIM_ATT    Countries    Country ID            KEY     VARCHAR2(10)
COUNTRIES_DIM_ATT    Countries    Country               PROP    VARCHAR2(3)
COUNTRIES_DIM_ATT    Countries    Country name          PROP    VARCHAR2(50)
COUNTRIES_DIM_ATT    Countries    Continent ID          KEY     NUMBER
COUNTRIES_DIM_ATT    Countries    Continent             PROP    VARCHAR2(10)
COUNTRIES_DIM_ATT    Countries    MEMBER_NAME           HIER    VARCHAR2(41)
COUNTRIES_DIM_ATT    Countries    MEMBER_UNIQUE_NAME    HIER    VARCHAR2(95)
COUNTRIES_DIM_ATT    Countries    MEMBER_CAPTION        HIER    VARCHAR2(12)
COUNTRIES_DIM_ATT    Countries    MEMBER_DESCRIPTION    HIER    VARCHAR2(50)
COUNTRIES_DIM_ATT    Countries    LEVEL_NAME            HIER    VARCHAR2(9)
COUNTRIES_DIM_ATT    Countries    HIER_ORDER            HIER    NUMBER
COUNTRIES_DIM_ATT    Countries    DEPTH                 HIER    NUMBER(10)
COUNTRIES_DIM_ATT    Countries    IS_LEAF               HIER    NUMBER
COUNTRIES_DIM_ATT    Countries    PARENT_LEVEL_NAME     HIER    VARCHAR2(9)
COUNTRIES_DIM_ATT    Countries    PARENT_UNIQUE_NAME    HIER    VARCHAR2(95)
                     MEASURES     Cases                 BASE    NUMBER
                     MEASURES     Highest cases         BASE    NUMBER

I have columns from all hierarchies, with KEY and PROPERTY attributes, and standardized names from the HIERARCHY, and the measures. You must remember that it is a virtual view: you will never query all columns and all rows. You SELECT the columns and filter (WHERE) the rows and levels and you get the result you want without GROUP BY and JOIN. If you look at the execution plan you will see the UNION ALL, JOIN, GROUP BY on the star or snowflake table. But this is out of the end-user concern. As a DBA you can create some materialized views to pre-build some summaries and query rewrite will used them.

We are fully within the initial SQL philosophy: a logical view provides an API that is independent of the physical design and easy to query, on a simple row/column table easy to visualize.

Analytic query

A query on the analytic view is then very simple. In the FROM clause, instead of tables with joins, I mention the analytic view, and instead of mentioning table aliases, I mention the hierarchy. I reference only the standard column names. Only the hierarchy names and the measures are specific. In the where clause, I can also reference the LEVEL_NAME:


SQL> 
select MEMBER_DESCRIPTION, "Cases"
 from "COVID cases" hierarchies ("Countries")
 where ( "Countries".level_name='Country' and "Countries".MEMBER_CAPTION in ('USA','CHN') )
    or ( "Countries".level_name in ('Continent','ALL') )
 order by "Cases";

         MEMBER_DESCRIPTION      Cases
___________________________ __________
Oceania                           8738
China                            84198
Africa                          203142
Asia                           1408945
United_States_of_America       1979850
Europe                         2100711
America                        3488230
                               7209766

Here I wanted to see the total covid-19 cases for all countries (‘ALL’), for each continent, and only two ones at the country level: USA and China. And this was a simple SELECT … FROM … WHERE … ORDER BY without joins and group by. Like a query on an OLAP cube.

If I had no analytic views, here is how I would have queried the tables:


SQL>
select coalesce(CONTINENT_NAME, COUNTRY_NAME,'ALL'), CASES from (
select CONTINENT_NAME, COUNTRY_NAME, sum(CASES) cases, COUNTRY_CODE, grouping(COUNTRY_CODE) g_country
from CASES join COUNTRIES using(COUNTRY_ID) join CONTINENTS using(CONTINENT_ID)
group by grouping sets ( () , (CONTINENT_NAME) , (COUNTRY_CODE,COUNTRY_NAME) )
)
where COUNTRY_CODE in ('USA','CHN') or g_country >0
order by cases
/

   COALESCE(CONTINENT_NAME,COUNTRY_NAME,'ALL')      CASES
______________________________________________ __________
Oceania                                              8738
China                                               84198
Africa                                             203142
Asia                                              1408945
United_States_of_America                          1979850
Europe                                            2100711
America                                           3488230
ALL                                               7209766

This was with GROUPING SETS to add multiple levels and GROUPING() function to detect the level. Without GROUPING SETS I may have done it with many UNION ALL between GROUP BY subqueries.

Back to roots of SQL

You may think that you don’t need Analytic Views because the same can be done by some BI reporting tools. But this should belong to the database. SQL was invented to provide a simple API to users. If you need an additional layer with a large repository of metadata and complex transformations between the user-defined query and the SQL to execute, then something is missed from the initial goal. One consequence is people going to NoSQL hierarchical databases with the idea that they are easier to visualize: simple API (a key-value get) and embedded metadata (as JSON for example). While SQL was more and more powerful to process data in the database, the complexity was going too far and developers prefered to come back to their procedural code rather than learning something new. And the first step of many current developments is to move the data out of the database, to NoSQL, or to an OLAP cube in our case.

Analytic views bring back the power of SQL: the view exposes a Data Mart as one simple table with columns and rows, containing all dimensions and levels of aggregation. The metadata that describes the data model is back where it belongs: the data dictionary. My example here is a very simple one but it can go further, with classification to add more metadata for self-documentation, with more hierarchies (and a special one for the time dimension), and many calculated measures.
SQL on it is simplified, and there are also some GUI over analytic views, like APEX, or SQL Developer:

And if SQL is still too complex, it seems that we can query Analytic Views with MDX (MultiDimensional eXpressions). The MEMBER_UNIQUE_NAME follows the MDX syntax and we can find this in ?/mesg/oraus.msg list of error messages:


/============================================================================
/
/    18200 - 18699 Reserved for Analytic View Sql (HCS) error messages
/
/============================================================================
/
/// 18200 - 18219 reserved for MDX Parser
/

HCS is the initial name of this feature (Hierarchical Cubes). I’ve not seen other mentions of MDX in the Oracle Database documentation, so I’ve no idea if it is already implemented.

Cet article No{Join,GroupBy}SQL – Analytic Views for BI est apparu en premier sur Blog dbi services.

↧

Documentum Upgrade – Missing DARs after upgrade

June 15, 2020, 11:00 am

≫ Next: Publishing a PowerShell script to AWS Lambda

≪ Previous: No{Join,GroupBy}SQL – Analytic Views for BI

As part of the same migration & upgrade project I talked about in previous blogs already (corrupt lockbox, duplicate objects & wrong target_server), I have seen a very annoying and, this time, absolutely not consistent behavior in some upgrade from Documentum 7.x to 16.x versions. The issue or rather the issues I had was that random DAR files were not installed properly. This makes it rather difficult to anticipate since you basically don’t know what might fail before you actually do it for real. Performing DryRun helps a lot in anticipating potential (recurring) problems but if the issue itself is random, there isn’t much you can do without some gifts (if you can see the future, please reach out to me!)…

In the past couple months, I performed around a dozen {migration+upgrade} and about half of these had issues with random DARs installation during the upgrade process. Even a DryRun and a real execution of the exact same procedure using the exact same source system ended-up with two different results: one worked without issue (the real migration fortunately) while the DryRun ended-up with a missing dar. In the procedure, it is checked whether or not there are any locks on repository objects, whether there are inconsistencies, whether there are any tasks in progress, aso…

Issues were mostly linked to the following few DARs:

LDAP.dar
MessagingApp.dar
MailApp.dar

I. LDAP

First, regarding the LDAP dar file, it only happened once and it was pretty easy to spot. As part of the migrations, I had to change the LDAP Server used. Since the target system was on Kubernetes using complete CI/CD, we automated the creation of the LDAP Config Object with all its parameters but this piece failed for one of the migration. Replicating the issue showed the following outcome:

[dmadmin@stg_cs ~]$ iapi REPO1
Please enter a user (dmadmin):
Please enter password for dmadmin:

		OpenText Documentum iapi - Interactive API interface
		Copyright (c) 2018. OpenText Corporation
		All rights reserved.
		Client Library Release 16.4.0170.0080

Connecting to Server using docbase REPO1
[DM_SESSION_I_SESSION_START]info:  "Session 010f123450262d3b started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0170.0234  Linux64.Oracle
Session id is s0
API> ?,c,select r_object_id, object_name from dm_ldap_config
r_object_id       object_name
----------------  ------------------------
(0 rows affected)

API> create,c,dm_ldap_config
...
[DM_DFC_E_CLASS_NOT_FOUND]error:  "Unable to instantiate the necessary java class: com.documentum.ldap.impl.DfLdapConfig"

java.lang.ClassNotFoundException: com.documentum.ldap.impl.DfLdapConfig

com.documentum.thirdparty.javassist.NotFoundException: com.documentum.ldap.impl.DfLdapConfig


API> ?,c,SELECT r_object_id, r_modify_date, object_name FROM dmc_dar ORDER BY r_modify_date ASC;
r_object_id       r_modify_date              object_name
----------------  -------------------------  ------------------------
080f1234500007a5  12/1/2018 09:05:30         LDAP
080f12345086063d  2/12/2020 16:26:12         Smart Container
080f123450860780  2/12/2020 16:26:44         Webtop
080f1234508607a1  2/12/2020 16:26:59         Workflow
080f1234508607f9  2/12/2020 16:27:34         Presets
...

API> exit
Bye
[dmadmin@stg_cs ~]$

This kind of error ([DM_DFC_E_CLASS_NOT_FOUND]error: “Unable to instantiate the necessary java class: com.documentum.ldap.impl.DfLdapConfig”) can happen when the LDAP dar isn’t installed properly. In this case, during the upgrade it was indeed what happened, the current DAR seemed to be from the source system before the upgrade (r_modify_date is much older). The DAR installation log file generated by the upgrade shows that the LDAP one was skipped:

[dmadmin@stg_cs ~]$ grep "\[ERR" $DOCUMENTUM/dba/config/REPO1/dars.log
[ERROR]  A module 'IDfLdapConfigModule' already exists under folder 'IDfLdapConfigModule'.
[dmadmin@stg_cs ~]$

After re-install of the LDAP dar, the issue was resolved.

II. MessagingApp

Then regarding the MessagingApp dar file, this one also only happened once and it was very strange… While doing sanity checks after the end of the migration, everything was working except for searches from a client application like DA or D2. From the repository itself, full text searches were working properly:

API> ?,c,SELECT r_object_id, object_name FROM dm_document SEARCH document contains 'TestDocument';
r_object_id       object_name
----------------  --------------------
090f2345600731d6  TestDoc.pdf
(1 row affected)

However, doing the same kind of search on D2 for example showed something completely different:

2020-03-03 10:30:55,750 UTC [INFO ] ([ACTIVE] ExecuteThread: '70' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.server.services.RpcDoclistServiceImpl  : Context REPO2-1583231056848-dmadmin-2003987903 with terms = TestDocument
2020-03-03 10:30:55,751 UTC [DEBUG] ([ACTIVE] ExecuteThread: '70' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.dctm.aspects.InjectSessionAspect   : Call first service D2SearchService.getQuickSearchContentWithOption(..)
2020-03-03 10:30:55,751 UTC [DEBUG] ([ACTIVE] ExecuteThread: '70' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.dctm.aspects.InjectSessionAspect   : InjectSessionAspect::process method: com.emc.d2fs.dctm.web.services.search.D2SearchService.getQuickSearchContentWithOption
...
...
2020-03-03 10:31:01,289 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.common.dctm.queries.D2QueryBuilder    : Query History: IDfQueryEvent(INTERNAL, DEFAULT): [REPO2] returned [Start processing] at [2020-03-03 10:30:56:007 +0000]
IDfQueryEvent(ERROR, UNKNOWN): [REPO2] returned [[DM_VEL_INSTANTIATION_ERROR]error:  "Cannot instantiate Java class"] at [2020-03-03 10:31:01:280 +0000]
DfServiceInstantiationException:: THREAD: Search Broker:REPO2:processing started at Tue Mar 03 10:30:55 UTC 2020; MSG: [DM_VEL_INSTANTIATION_ERROR]error:  "Cannot instantiate Java class"; ERRORCODE: 1902; NEXT: null
        at com.documentum.fc.client.impl.bof.classmgmt.ModuleManager.loadModuleClass(ModuleManager.java:258)
        at com.documentum.fc.client.impl.bof.classmgmt.ModuleManager.getModuleClass(ModuleManager.java:203)
        at com.documentum.fc.client.impl.bof.classmgmt.ModuleManager.newModule(ModuleManager.java:154)
        at com.documentum.fc.client.impl.bof.classmgmt.ModuleManager.newModule(ModuleManager.java:86)
        at com.documentum.fc.client.impl.bof.classmgmt.ModuleManager.newModule(ModuleManager.java:60)
        at com.documentum.fc.client.DfClient$ClientImpl.newModule(DfClient.java:466)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.ComplexMappingDefinitionManager.getMappingModule(ComplexMappingDefinitionManager.java:352)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.ComplexMappingDefinitionManager.getComplexMappingDefinitionFromDocbase(ComplexMappingDefinitionManager.java:319)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.ComplexMappingDefinitionManager.loadComplexMappingDefinition(ComplexMappingDefinitionManager.java:149)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.ComplexMappingDefinitionManager.getComplexMappingDefinition(ComplexMappingDefinitionManager.java:75)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.loading.legacy.LegacyMappingLoader.loadSearchInterfaces(LegacyMappingLoader.java:42)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.EosMappingLoader.populateLegacyMapping(EosMappingLoader.java:199)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.EosMappingLoader.populateMappingCache(EosMappingLoader.java:112)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.definition.EosMappingLoader.getInterface(EosMappingLoader.java:63)
        at com.documentum.fc.client.search.impl.generation.docbase.common.sco.mapping.SCOGenerator.isComplexQuery(SCOGenerator.java:38)
        at com.documentum.fc.client.search.impl.generation.docbase.TargetLanguageSelector.initByQueryBuilder(TargetLanguageSelector.java:85)
        at com.documentum.fc.client.search.impl.generation.docbase.TargetLanguageSelector.<init>(TargetLanguageSelector.java:39)
        at com.documentum.fc.client.search.impl.generation.docbase.DocbaseQueryGeneratorManager.generateQueryExecutor(DocbaseQueryGeneratorManager.java:248)
        at com.documentum.fc.client.search.impl.generation.docbase.DocbaseQueryGeneratorManager.generateQueryExecutor(DocbaseQueryGeneratorManager.java:96)
        at com.documentum.fc.client.search.impl.execution.adapter.docbase.DocbaseAdapter.execute(DocbaseAdapter.java:83)
        at com.documentum.fc.client.search.impl.execution.broker.SearchJob.handleProcessingState(SearchJob.java:382)
        at com.documentum.fc.client.search.impl.execution.broker.SearchJob.doRunLoop(SearchJob.java:477)
        at com.documentum.fc.client.search.impl.execution.broker.SearchJob.run(SearchJob.java:433)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.documentum.services.complexobjects.impl.ComplexObjectMappingDefImpl
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at com.documentum.fc.client.impl.bof.classmgmt.URLClassLoaderEx.findClass(URLClassLoaderEx.java:49)
        at com.documentum.fc.client.impl.bof.classmgmt.DelayedDelegationClassLoader.findClass(DelayedDelegationClassLoader.java:241)
        at com.documentum.fc.client.impl.bof.classmgmt.AbstractTransformingClassLoader.findClass(AbstractTransformingClassLoader.java:122)
        at com.documentum.fc.client.impl.bof.classmgmt.DelayedDelegationClassLoader.loadClass(DelayedDelegationClassLoader.java:147)
        at com.documentum.fc.client.impl.bof.classmgmt.AbstractTransformingClassLoader.loadClass(AbstractTransformingClassLoader.java:69)
        at com.documentum.fc.client.impl.bof.classmgmt.ModuleManager.loadModuleClass(ModuleManager.java:254)
        ... 25 more

IDfQueryEvent(ERROR, UNREACHABLE): [REPO2] returned [Unable to process query] at [2020-03-03 10:31:01:281 +0000]
, Query Status: 6
2020-03-03 10:31:01,291 UTC [ERROR] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.common.dctm.queries.D2QueryBuilder    : The search has failed. null
2020-03-03 10:31:01,307 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Executing xPlore search ended : 6.722s
2020-03-03 10:31:01,307 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Enter buildItems
2020-03-03 10:31:01,308 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : FACETS: ObjectID = 080f2345602f4a20
2020-03-03 10:31:01,310 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : listColNames=[object_name, score, title, a_status, r_modify_date, r_modifier]
2020-03-03 10:31:01,311 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Exit buildItems
2020-03-03 10:31:01,311 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : FACETS: leaving getContent
2020-03-03 10:31:01,354 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : getSearchContent - start building facets
2020-03-03 10:31:01,355 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Exit buildFacets
2020-03-03 10:31:01,355 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Query name = lastSearch
2020-03-03 10:31:01,363 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Enter getObjectName
2020-03-03 10:31:01,364 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Exit getObjectName
2020-03-03 10:31:01,364 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : FACETS: attrNameList from query = []
2020-03-03 10:31:01,364 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : FACETS: attrValueList from query = []
2020-03-03 10:31:01,365 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : searchTypes = [dm_document]
2020-03-03 10:31:01,365 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : getSearchContent - done building facets
2020-03-03 10:31:01,365 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.content.D2cQueryContent     : Exit getSearchContent
2020-03-03 10:31:01,366 UTC [ERROR] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.dctm.aspects.InjectSessionAspect   : {}
com.documentum.fc.common.DfException: The search has failed.
[DM_VEL_INSTANTIATION_ERROR]
        at com.emc.d2fs.dctm.content.D2cQueryContent.getSearchContent(D2cQueryContent.java:598)
        at com.emc.d2fs.dctm.content.NodeLastSearchContent.getSearchContent(NodeLastSearchContent.java:217)
        at com.emc.d2fs.dctm.web.services.content.D2ContentService.getContent(D2ContentService.java:391)
        at com.emc.d2fs.dctm.web.services.content.D2ContentService.getSearchContent_aroundBody14(D2ContentService.java:425)
        at com.emc.d2fs.dctm.web.services.content.D2ContentService$AjcClosure15.run(D2ContentService.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:229)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.process(InjectSessionAspect.java:240)
        at com.emc.d2fs.dctm.web.services.content.D2ContentService.getSearchContent(D2ContentService.java:403)
        at com.emc.x3.client.services.search.RpcSearchManagerServiceImpl.getSearchResults(RpcSearchManagerServiceImpl.java:37)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.google.gwt.user.server.rpc.RPC.invokeAndEncodeResponse(RPC.java:587)
        at com.emc.x3.server.GuiceRemoteServiceServlet.processCall(GuiceRemoteServiceServlet.java:105)
        at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:373)
        at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
        at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
        at com.custom.d2.auth.filters.NonSSOAuthenticationFilter.executeChain(NonSSOAuthenticationFilter.java:33)
        at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
        at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
        at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
        at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
        at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
        at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
        at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
        at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
        at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
        at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.x3.portal.server.filters.X3SessionTimeoutFilter.doFilter(X3SessionTimeoutFilter.java:40)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3706)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3672)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:328)
        at weblogic.security.service.SecurityManager.runAsForUserCode(SecurityManager.java:197)
        at weblogic.servlet.provider.WlsSecurityProvider.runAsForUserCode(WlsSecurityProvider.java:203)
        at weblogic.servlet.provider.WlsSubjectHandle.run(WlsSubjectHandle.java:71)
        at weblogic.servlet.internal.WebAppServletContext.doSecuredExecute(WebAppServletContext.java:2443)
        at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2291)
        at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2269)
        at weblogic.servlet.internal.ServletRequestImpl.runInternal(ServletRequestImpl.java:1705)
        at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1665)
        at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:272)
        at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:352)
        at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:337)
        at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:57)
        at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41)
        at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:652)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
2020-03-03 10:31:01,366 UTC [DEBUG] ([ACTIVE] ExecuteThread: '76' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.web.services.D2fsContext    : Release session : s1

As you can see above, D2 complains about instantiation of a specific class (“com.documentum.services.complexobjects.impl.ComplexObjectMappingDefImpl”). This class is part of an SBO bundled in the MessagingApp.dar as mentioned on KB6289567 & KB6296577.

Therefore, the DAR installation must have failed, right? Well it didn’t and that’s the strange thing I was talking about… I have the evidences of the proper installation of the MessagingApp.dar inside the repository the day before:

[INFO]  ******************************************************
[INFO]  * Headless Composer
[INFO]  * Version:        16.4.000.0042
[INFO]  * Java version:   1.8.0_152 (64bit)
[INFO]  * Java home:      $JAVA_HOME/jre
[INFO]  * Set storage type: false
[INFO]  *
[INFO]  * DAR file:       $DOCUMENTUM/product/16.4/install/DARsInternal/MessagingApp.dar
[INFO]  * Project name:   MessagingApp
[INFO]  * Built by Composer: 7.1.0000.0186
[INFO]  *
[INFO]  * Repository:     REPO2
[INFO]  * Server version: 16.4.0170.0234  Linux64.Oracle
[INFO]  * User name:      dmadmin
[INFO]  ******************************************************
[INFO]  Install started...  Mon Mar 02 22:18:00 UTC 2020
[INFO]  Executing pre-install script
[INFO]  Pre-install script executed successfully Mon Mar 02 22:18:00 UTC 2020
...
[INFO]  Done Overwriting object : 'com.documentum.services.complexobjects.impl.ComplexObjectMappingDefImpl'(dmc_module 0b0f2345600008f9)
...
[INFO]  Done Versioning object : 'MessagingApp'(dmc_dar 080f2345608608e9)
...
[INFO]  Finished executing post-install actions Mon Mar 02 22:18:30 UTC 2020
[INFO]  Finished executing post-install script Mon Mar 02 22:18:32 UTC 2020
[INFO]  Project 'MessagingApp' was successfully installed.

There are absolutely no errors and it shows that the missing class “com.documentum.services.complexobjects.impl.ComplexObjectMappingDefImpl” was upgraded properly but on D2, it doesn’t work (it was properly installed on both the Global Registry and the Repository used for the search). Re-installing again the DAR file produced exactly the same log file: 100% the same except for the date, obviously. After the re-installation of the DAR, the issue was magically gone. For this issue honestly, I’m still amazed how this can be possible and I’m pretty sure I will never find any reason.

III. MailApp

Finally, the last issue is with the MailApp dar file. That’s the one which had the most occurrences as far as I could see. During an upgrade from 7.3 to 16.4 P17, the dar installation failed and the following was shown inside the “dars.log” file:

[INFO]  ******************************************************
[INFO]  * Headless Composer
[INFO]  * Version:        16.4.000.0042
[INFO]  * Java version:   1.8.0_152 (64bit)
[INFO]  * Java home:      $JAVA_HOME/jre
[INFO]  * Set storage type: false
[INFO]  *
[INFO]  * DAR file:       $DOCUMENTUM/product/16.4/install/DARsInternal/MailApp.dar
[INFO]  * Project name:   MailApp
[INFO]  * Built by Composer: 7.1.0000.0186
[INFO]  *
[INFO]  * Repository:     REPO3
[INFO]  * Server version: 16.4.0170.0234  Linux64.Oracle
[INFO]  * User name:      dmadmin
[INFO]  ******************************************************
[INFO]  Install started...  Thu Mar 12 10:08:27 UTC 2020
[INFO]  Executing pre-install script
[INFO]  dmbasic.exe output : connecting docbase...REPO3
[INFO]  dmbasic.exe output : dm_attachment_folder type exists
[INFO]  dmbasic.exe output : Relation type 'dm_attachments_relation' already exists
[INFO]  dmbasic.exe output : Disconnect from the docbase.
[INFO]  Pre-install script executed successfully Thu Mar 12 10:08:31 UTC 2020
[WARN]  Cannot retrieve object by Object Id. This may happen if an object previously installed by Composer was deleted. Object reference will be returned as null. OID: 0b0f345670000df1, URN: urnd:com.emc.ide.artifact.moduledef/com.documentum.mailapp.operations.DfPreProcessMessageObject?artifactURI=file:/C:/Source/.../com.documentum.mailapp.operations.dfpreprocessmessageobject.module#//@dataModel/@externalInterfaces
[WARN]  Cannot retrieve object by Object Id. This may happen if an object previously installed by Composer was deleted. Object reference will be returned as null. OID: 0b0f345670000fe9, URN: urnd:com.emc.ide.artifact.aspectmoduledef/dm_attachmentfolder_aspect?artifactURI=file:/C:/Source/.../dm_attachmentfolder_aspect.module#//@dataModel/@miscellaneous
[WARN]  Cannot retrieve object by Object Id. This may happen if an object previously installed by Composer was deleted. Object reference will be returned as null. OID: 090f345670000e37, URN: urnd:com.emc.ide.artifact.jardef.jardef/attachmentfolderaspect.jar?artifactURI=file:/C:/source/.../attachmentfolderaspect.jar%5B1%5D.jardef#//@dataModel
[WARN]  Cannot retrieve object by Object Id. This may happen if an object previously installed by Composer was deleted. Object reference will be returned as null. OID: 080f345670000e41, URN: urnd:com.emc.ide.artifact.moduledef/com.message.aspose?artifactURI=file:/C:/Users/.../com.message.aspose.module#//@dataModel/@runtimeEnvironmentXML
[ERROR]  A module 'dm_attachmentfolder_aspect' already exists under folder 'Aspect'.
[ERROR]  A module 'mdmo_message_aspect' already exists under folder 'Aspect'.
[WARN]  superTypeName is null. This might happen if the dependent project is not Installed in the same ANT build invocation
[ERROR]  A module 'com.documentum.mailapp.operations.DfPreProcessMessageObject' already exists under folder 'Operations'.
[ERROR]  A module 'com.documentum.mailapp.operations.inbound.DfCleanUpLocalMailAppFiles' already exists under folder 'Operations'.
[ERROR]  A module 'com.documentum.mailapp.operations.inbound.DfFixUpAttachments' already exists under folder 'Operations'.
[ERROR]  A module 'com.documentum.mailapp.operations.inbound.DfImportMailObject' already exists under folder 'Operations'.
[ERROR]  A module 'com.documentum.mailapp.operations.inbound.DfSeparateAttachments' already exists under folder 'Operations'.
[ERROR]  A module 'aspose' already exists under folder 'Modules'.
[ERROR]  A module 'mailappconfig' already exists under folder 'Modules'.
[INFO]  MailApp install aborted by user.

On another migration with a source that is 7.2 this time and a target that is 16.4 P20, we had another batch of issues. On 7.2, the MailApp didn’t exist (as far as I know), so the upgrade is supposed to install for the first time this DAR but it fails because some of the pieces already exists. If you look at the logs above, the same type existed already as well but above, it just continued without any problem the “Pre-install” script (line 19, 22 above // line 20, 22 below). Below, it fails on already existing types and in both cases [above for 7.3 and below for 7.2], the flag “preserve_existing_types” is set to “T” (True) in the server.ini of all repositories so it doesn’t make much sense that there is a difference in behavior… However, that’s how it is so if you have any explanation, feel free to share! I asked OpenText to look into it but nothing came out of it so far. Anyway, so here are the logs on the 7.2 repository:

[INFO]  ******************************************************
[INFO]  * Headless Composer
[INFO]  * Version:        16.4.000.0042
[INFO]  * Java version:   1.8.0_152 (64bit)
[INFO]  * Java home:      $JAVA_HOME/jre
[INFO]  * Set storage type: false
[INFO]  *
[INFO]  * DAR file:       $DOCUMENTUM/product/16.4/install/DARsInternal/MailApp.dar
[INFO]  * Project name:   MailApp
[INFO]  * Built by Composer: 7.1.0000.0186
[INFO]  *
[INFO]  * Repository:     REPO4
[INFO]  * Server version: 16.4.0200.0256  Linux64.Oracle
[INFO]  * User name:      dmadmin
[INFO]  ******************************************************
[INFO]  Install started...  Fri Apr 03 08:32:45 UTC 2020
[INFO]  Executing pre-install script
[INFO]  dmbasic.exe output : connecting docbase...REPO4
[INFO]  dmbasic.exe output : Create dm_state_extension type.
[INFO]  dmbasic.exe output : [DM_QUERY_E_CREATE_FAILED]error:  "CREATE TYPE statement failed for type: dm_attachment_folder."
[INFO]  dmbasic.exe output :
[INFO]  dmbasic.exe output : [DM_TYPE_MGR_E_EXISTING_TABLE]error:  "Cannot create type dm_attachment_folder because the table dm_attachment_folder_s unexpectedly already exists in the database and the server 'preserve_existing_types' flag is enabled.  To complete this operation the table must first be manually dropped or the server flag disabled."
[INFO]  dmbasic.exe output :
[INFO]  dmbasic.exe output :
[INFO]  dmbasic.exe output : Failed to create dm_attachment_folder type
[ERROR]  Procedure execution failed with dmbasic.exe exit value : 255
[INFO]  MailApp install failed.
[ERROR]  Unable to install dar file $DOCUMENTUM/product/16.4/install/DARsInternal/MailApp.dar
com.emc.ide.installer.PreInstallException: Error running pre-install procedure "presetup". Please contact the procedure owner to verify if it is functioning properly. Please also check if the JAVA_HOME is pointing to the correct JDK. In case of multiple installed JDK's, please provide -vm <JDK>bin flag in the composer.ini/dardeployer.ini files
        at internal.com.emc.ide.installer.DarInstaller.preInstall(DarInstaller.java:1085)
        at internal.com.emc.ide.installer.DarInstaller.doInstall(DarInstaller.java:495)
        at internal.com.emc.ide.installer.DarInstaller.doInstall(DarInstaller.java:334)
        at internal.com.emc.ide.installer.DarInstaller.doInstall(DarInstaller.java:303)
        at com.emc.ide.installer.util.IDarInstallerHelper.doInPlaceInstall(IDarInstallerHelper.java:127)
        at com.emc.ant.installer.api.InstallerAntTask.installDar(InstallerAntTask.java:258)
        at com.emc.ant.installer.api.InstallerAntTask.execute(InstallerAntTask.java:135)
        at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
        at org.apache.tools.ant.Task.perform(Task.java:348)
        at org.apache.tools.ant.Target.execute(Target.java:392)
        at org.apache.tools.ant.Target.performTasks(Target.java:413)
        at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
        at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
        at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
        at org.eclipse.ant.internal.core.ant.EclipseDefaultExecutor.executeTargets(EclipseDefaultExecutor.java:32)
        at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
        at org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:672)
        at org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:537)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.eclipse.ant.core.AntRunner.run(AntRunner.java:513)
        at org.eclipse.ant.core.AntRunner.start(AntRunner.java:600)
        at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:353)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:180)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:629)
        at org.eclipse.equinox.launcher.Main.basicRun(Main.java:584)
        at org.eclipse.equinox.launcher.Main.run(Main.java:1438)
        at org.eclipse.equinox.launcher.Main.main(Main.java:1414)
        at org.eclipse.core.launcher.Main.main(Main.java:34)
Caused by: com.emc.ide.external.dfc.procedurerunner.ProcedureRunnerException: Procedure execution failed with dmbasic.exe exit value : 255
        at com.emc.ide.external.dfc.procedurerunner.ProcedureRunnerUtils.executeDmBasic(ProcedureRunnerUtils.java:283)
        at com.emc.ide.external.dfc.procedurerunner.ProcedureRunner.execute(ProcedureRunner.java:55)
        at internal.com.emc.ide.installer.DarInstaller.preInstall(DarInstaller.java:1080)
        ... 42 more
[ERROR]  Failed to install DAR
Unable to install dar file $DOCUMENTUM/product/16.4/install/DARsInternal/MailApp.dar
        at com.emc.ant.installer.api.InstallerAntTask.installDar(InstallerAntTask.java:273)
        at com.emc.ant.installer.api.InstallerAntTask.execute(InstallerAntTask.java:135)
        at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
        at org.apache.tools.ant.Task.perform(Task.java:348)
        at org.apache.tools.ant.Target.execute(Target.java:392)
        at org.apache.tools.ant.Target.performTasks(Target.java:413)
        at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
        at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
        at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
        at org.eclipse.ant.internal.core.ant.EclipseDefaultExecutor.executeTargets(EclipseDefaultExecutor.java:32)
        at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
        at org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:672)
        at org.eclipse.ant.internal.core.ant.InternalAntRunner.run(InternalAntRunner.java:537)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.eclipse.ant.core.AntRunner.run(AntRunner.java:513)
        at org.eclipse.ant.core.AntRunner.start(AntRunner.java:600)
        at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:353)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:180)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:629)
        at org.eclipse.equinox.launcher.Main.basicRun(Main.java:584)
        at org.eclipse.equinox.launcher.Main.run(Main.java:1438)
        at org.eclipse.equinox.launcher.Main.main(Main.java:1414)
        at org.eclipse.core.launcher.Main.main(Main.java:34)
Caused by: com.emc.ide.installer.PreInstallException: Error running pre-install procedure "presetup". Please contact the procedure owner to verify if it is functioning properly. Please also check if the JAVA_HOME is pointing to the correct JDK. In case of multiple installed JDK's, please provide -vm <JDK>bin flag in the composer.ini/dardeployer.ini files
        at internal.com.emc.ide.installer.DarInstaller.preInstall(DarInstaller.java:1085)
        at internal.com.emc.ide.installer.DarInstaller.doInstall(DarInstaller.java:495)
        at internal.com.emc.ide.installer.DarInstaller.doInstall(DarInstaller.java:334)
        at internal.com.emc.ide.installer.DarInstaller.doInstall(DarInstaller.java:303)
        at com.emc.ide.installer.util.IDarInstallerHelper.doInPlaceInstall(IDarInstallerHelper.java:127)
        at com.emc.ant.installer.api.InstallerAntTask.installDar(InstallerAntTask.java:258)
        ... 37 more
Caused by: com.emc.ide.external.dfc.procedurerunner.ProcedureRunnerException: Procedure execution failed with dmbasic.exe exit value : 255
        at com.emc.ide.external.dfc.procedurerunner.ProcedureRunnerUtils.executeDmBasic(ProcedureRunnerUtils.java:283)
        at com.emc.ide.external.dfc.procedurerunner.ProcedureRunner.execute(ProcedureRunner.java:55)
        at internal.com.emc.ide.installer.DarInstaller.preInstall(DarInstaller.java:1080)
        ... 42 more

If you have the above errors, you can just set the “preserve_existing_types” flag to “F” (False), then start again the DAR installation and it should be installing properly this time. Please take care with this flag! If you are copying the repository, it must be set to “T” (True) otherwise it will most likely cause you big troubles… But for in-place upgrade, you can and should set it to “F” (False) before starting the repository upgrade and switch it back to “T” (True) once the upgrade is completed and all DARs have been installed. So make sure you do that and the number of issues during DAR installations should decrease drastically.

Anyway, all that to say that there are some best practices to apply to upgrade, even if it’s not documented anywhere. In addition, you should be careful about the DARs installation logs and really test your application because even when everything seems to went well, you might not be completely safe… Where would be the fun if you could rely on deterministic systems?

Cet article Documentum Upgrade – Missing DARs after upgrade est apparu en premier sur Blog dbi services.

↧

Publishing a PowerShell script to AWS Lambda

June 16, 2020, 6:20 am

≫ Next: Oracle non-linguistic varchar2 columns to order by without sorting

≪ Previous: Documentum Upgrade – Missing DARs after upgrade

I’ve done some Lambda functions with Python in the past and it was quite easy to publish that to Lambda (by just uploading a zip file with all my code and dependencies). You might ask yourself why I want to do that with PowerShell but the reason is quite simple: There was a requirement at a customer to automatically collect all the KBs that are installed in the AWS Windows WorkSpaces for compliance reasons. Doing that for EC2 or on-prem instances is quite easy using Lambda for Python against SSM when you are using SSM for patching, but if you want to list the installed KBs of your deployed AWS WorkSpaces you need a different way of doing that. After discussing that with AWS Support it turned out that the easiest solution for this is to use the PowerShell Get-HotFix module remotely against the AWS WorkSpaces. Easy, I thought, when I can deploy Python code in Lambda I can easily do this for PowerShell as well. But this is definitely not true as the process is quite different. So, here we go …

The first bit you need to prepare is a PowerShell development environment for AWS. As I am running Linux (KDE Neon, if you want to know exactly), and PowerShell is available on Linux since quite some time, I’ll be showing how to do that on Linux (the process is more or less the same for Windows though).
Obviously PowerShell needs to be installed and this is documented by Microsoft quite well, no need to further explain this. Basically it is matter of:

$ wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb
$ sudo dpkg -i packages-microsoft-prod.deb
$ sudo apt-get update
$ sudo add-apt-repository universe
$ sudo apt-get install -y powershell

… and that’s it (take care to follow the steps for your Linux distribution). Once that is done PowerShell can be started:

$ pwsh
PowerShell 7.0.2
Copyright (c) Microsoft Corporation. All rights reserved.

https://aka.ms/powershell
Type 'help' to get help.

PS /home/dwe>

The first additional module you’ll need is AWSLambdaPSCore:

PS /home/dwe> Install-Module AWSLambdaPSCore -Scope CurrentUser

Untrusted repository
You are installing the modules from an untrusted repository. If you trust this repository, change its InstallationPolicy value by running the 
Set-PSRepository cmdlet. Are you sure you want to install the modules from 'PSGallery'?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "N"): Y

Usually you want to work with other AWS Services in your Lambda code so it is recommended to install the AWS.Tools.Installer module as it provides a convenient way for installing the various tools required for working with the various AWS services. In addition the AWSPowerShell.NetCore module is required:

PS /home/dwe> Install-Module -Name AWS.Tools.Installer -Force    
PS /home/dwe> Install-Module -name AWSPowerShell.NetCore -Force

Now, dependend on which AWS services you want to work with, just install what you need (in this example EC2, S3 and WorkSpaces):

PS /home/dwe> Install-AWSToolsModule AWS.Tools.EC2,AWS.Tools.S3 -CleanUp -Force                                                                             
Installing module AWS.Tools.EC2 version 4.0.6.0                                                                                                             
Installing module AWS.Tools.S3 version 4.0.6.0                                                                                                              
PS /home/dwe> Install-AWSToolsModule AWS.Tools.Workspaces -CleanUp -Force                                                                                   
Installing module AWS.Tools.WorkSpaces version 4.0.6.0

Once you have that ready you can use the AWS tools for PowerShell to generate a template you can start with:

PS /home/dwe> Get-AWSPowerShellLambdaTemplate                                                                                                                                                                                                                                                                           Template                     Description                                                                                                                    
--------                     -----------                                                                                                                    
Basic                        Bare bones script                                                                                                              
CloudFormationCustomResource PowerShell handler base for use with CloudFormation custom resource events
CodeCommitTrigger            Script to process AWS CodeCommit Triggers
DetectLabels                 Use Amazon Rekognition service to tag image files in S3 with detected labels.
KinesisStreamProcessor       Script to be process a Kinesis Stream
S3Event                      Script to process S3 events
S3EventToSNS                 Script to process SNS Records triggered by S3 events
S3EventToSNSToSQS            Script to process SQS Messages, subscribed to an SNS Topic that is triggered by S3 events
S3EventToSQS                 Script to process SQS Messages triggered by S3 events
SNSSubscription              Script to be subscribed to an SNS Topic
SNSToSQS                     Script to be subscribed to an SQS Queue, that is subscribed to an SNS Topic
SQSQueueProcessor            Script to be subscribed to an SQS Queue


PS /home/dwe> cd ./Documents/aws
PS /home/dwe/Documents/aws> New-AWSPowerShellLambda -ScriptName MyFirstPowershellLambda -Template Basic
Configuring script to use installed version 4.0.6.0 of (@{ ModuleName = 'AWS.Tools.Common'; ModuleVersion = '4.0.5.0' }.Name)
Created new AWS Lambda PowerShell script MyFirstPowershellLambda.ps1 from template Basic at /home/dwe/Documents/aws/MyFirstPowershellLambda

PS /home/dwe/Documents/aws/MyFirstPowershellLambda> ls
MyFirstPowershellLambda.ps1  readme.txt

The generated template is quite simple but it gives you an idea how to start:

PS /home/dwe/Documents/aws/MyFirstPowershellLambda> cat ./MyFirstPowershellLambda.ps1
# PowerShell script file to be executed as a AWS Lambda function. 
# 
# When executing in Lambda the following variables will be predefined.
#   $LambdaInput - A PSObject that contains the Lambda function input data.
#   $LambdaContext - An Amazon.Lambda.Core.ILambdaContext object that contains information about the currently running Lambda environment.
#
# The last item in the PowerShell pipeline will be returned as the result of the Lambda function.
#
# To include PowerShell modules with your Lambda function, like the AWS.Tools.S3 module, add a "#Requires" statement
# indicating the module and version. If using an AWS.Tools.* module the AWS.Tools.Common module is also required.

#Requires -Modules @{ModuleName='AWS.Tools.Common';ModuleVersion='4.0.6.0'}

# Uncomment to send the input event to CloudWatch Logs
# Write-Host (ConvertTo-Json -InputObject $LambdaInput -Compress -Depth 5)

Just add the modules for the specific AWS services you want to work with in the “#Requires” section (you need to install them before of course) and write your script:

PS /home/dwe/Documents/aws/MyFirstPowershellLambda> cat ./MyFirstPowershellLambda.ps1
# PowerShell script file to be executed as a AWS Lambda function. 
# 
# When executing in Lambda the following variables will be predefined.
#   $LambdaInput - A PSObject that contains the Lambda function input data.
#   $LambdaContext - An Amazon.Lambda.Core.ILambdaContext object that contains information about the currently running Lambda environment.
#
# The last item in the PowerShell pipeline will be returned as the result of the Lambda function.
#
# To include PowerShell modules with your Lambda function, like the AWS.Tools.S3 module, add a "#Requires" statement
# indicating the module and version. If using an AWS.Tools.* module the AWS.Tools.Common module is also required.

#Requires -Modules @{ModuleName='AWS.Tools.Common';ModuleVersion='4.0.6.0'}
#Requires -Modules @{ModuleName='AWS.Tools.S3';ModuleVersion='4.0.6.0'}
#Requires -Modules @{ModuleName='AWS.Tools.EC2';ModuleVersion='4.0.6.0'}

# Uncomment to send the input event to CloudWatch Logs
# Write-Host (ConvertTo-Json -InputObject $LambdaInput -Compress -Depth 5)
Write-Output "Test"

The AWS documentation for the PowerShell Cmdlets is here.

Assuming that the script is completed (the above script does a simple print to the console) you need to deploy it to Lambda. For Python all you need to do is to zip your code and upload that to AWS Lambda. For PowerShell you need to call the “Publish-AWSPowerShellLambda” module passing in the script, a name for the Lambda function and the AWS region you want to have the function deployed to:

PS /home/dwe/Documents/aws/MyFirstPowershellLambda> Publish-AWSPowerShellLambda -ScriptPath ./MyFirstPowershellLambda.ps1 -Name MyFirstPowershellLambda  -Region eu-central-1

… and this will fail with:

Get-Command: /home/dwe/.local/share/powershell/Modules/AWSLambdaPSCore/2.0.0.0/Private/_DeploymentFunctions.ps1:544
Line |
 544 |      $application = Get-Command -Name dotnet
     |                     ~~~~~~~~~~~~~~~~~~~~~~~~
     | The term 'dotnet' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name,
     | or if a path was included, verify that the path is correct and try again.

Exception: /home/dwe/.local/share/powershell/Modules/AWSLambdaPSCore/2.0.0.0/Private/_DeploymentFunctions.ps1:547
Line |
 547 |          throw '.NET Core 3.1 SDK was not found which is required to b …
     |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | .NET Core 3.1 SDK was not found which is required to build the PowerShell Lambda package bundle. Download the .NET Core 3.1 SDK from
     | https://www.microsoft.com/net/download

The error message is quite clear: You need to install the “.NET Core 3.1 SDK” but as we added the Microsoft repositories above this is just a matter of (again, adjust for your package manager):

$ sudo apt-get install -y dotnet-sdk-3.1

Trying the same again and this time it succeeds:

PS /home/dwe/Documents/aws/MyFirstPowershellLambda> Publish-AWSPowerShellLambda -ScriptPath ./MyFirstPowershellLambda.ps1 -Name MyFirstPowershellLambda  -Region eu-central-1
Staging deployment at /tmp/MyFirstPowershellLambda
Configuring PowerShell to version 7.0.0
Generating C# project /tmp/MyFirstPowershellLambda/MyFirstPowershellLambda.csproj used to create Lambda function bundle.
Generating /tmp/MyFirstPowershellLambda/Bootstrap.cs to load PowerShell script and required modules in Lambda environment.
Generating aws-lambda-tools-defaults.json config file with default values used when publishing project.
Copying PowerShell script to staging directory
...
... zipping:   adding: Namotion.Reflection.dll (deflated 58%)
... zipping:   adding: System.Diagnostics.PerformanceCounter.dll (deflated 60%)
... zipping:   adding: MyFirstPowershellLambda.ps1 (deflated 53%)
... zipping:   adding: System.Management.dll (deflated 62%)
... zipping:   adding: Markdig.Signed.dll (deflated 62%)
... zipping:   adding: libpsl-native.so (deflated 69%)
...
Creating new Lambda function MyFirstPowershellLambda
Enter name of the new IAM Role:
dwe-tmp-role
...
Select IAM Policy to attach to the new role and grant permissions
    1) AWSLambdaFullAccess (Provides full access to Lambda, S3, DynamoDB, CloudWatch Metrics and  ...)
    2) AWSLambdaReplicator
...
1
Waiting for new IAM Role to propagate to AWS regions
...............  Done
New Lambda function created

Heading over to the AWS console we can see that the function is there:

Hope this helps…

Cet article Publishing a PowerShell script to AWS Lambda est apparu en premier sur Blog dbi services.

↧

Oracle non-linguistic varchar2 columns to order by without sorting

June 16, 2020, 1:51 pm

≫ Next: Power BI Report Server – URL reservation Warning – Delegation blocked

≪ Previous: Publishing a PowerShell script to AWS Lambda

By Franck Pachot

.
Sorting data is an expensive operation and many queries declare an ORDER BY. To avoid the sort operation you can build an index as it maintains a sorted structure. This helps with Top-N queries as you don’t have to read all rows but only those from a range of index entries. However, indexes are sorted by binary values. For NUMBER or DATE datatypes, the internal storage ensures that the order is preserved in the binary format. For character strings, the binary format is ASCII, which follows the English alphabet. That’s fine when your session language, NLS_LANGUAGE, defines an NLS_SORT that follows this BINARY order. But as soon as you set a language that has some specific alphabetical order, having an index on a VARCHAR2 or CHAR column does not help to avoid a SORT operation. However, in Oracle 12.2 we can define the sort order at column level with the SQL Standard COLLATE. One use case is for alpha-numeric columns that have nothing to do with any language. Like some natural keys combining letters and numbers. The user expects them to be listed in alphabetical order but, storing only 7-bits ASCII characters, you don’t care about linguistic collation.

I am running this on the Oracle 20c preview in the Oracle Cloud.

VARCHAR2

It can happen that a primary key is not a NUMBER but a CHAR or VARCHAR2, like this:


SQL> create table demo (ID constraint demp_pk primary key) as
  2  select cast(dbms_random.string('U',1)||to_char(rownum,'FM0999') as varchar2(5)) ID
  3  from xmltable('1 to 10');

Table created.

SQL> select * from demo order by ID;

      ID
________
K0003
K0009
L0007
L0010
M0008
O0002
S0001
W0005
Y0006
Z0004

10 rows selected.

I query with ORDER BY because sorting can make sense on a natural key.

Index

I have an index on this column, which is sorted, and then the execution plan is optimized:


SQL> select * from dbms_xplan.display_cursor(format=>'basic');

                      PLAN_TABLE_OUTPUT
_______________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 1955576728

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  INDEX FULL SCAN | DEMP_PK |
------------------------------------

13 rows selected.

There’s no SORT operation because the INDEX FULL SCAN follows the index entries in order.

NLS_LANGUAGE

However, there are many countries where we don’t speak English:


SQL> alter session set nls_language='French';

Session altered.

In French, like in many languages, we have accentuated characters and other specificities so that the language-alphabetical order does not always follow the ASCII order.

I’m running exactly the same query:


SQL> select * from demo order by ID;

      ID
________
K0003
K0009
L0007
L0010
M0008
O0002
S0001
W0005
Y0006
Z0004

10 rows selected.

SQL> select * from dbms_xplan.display_cursor(format=>'basic');

                      PLAN_TABLE_OUTPUT
_______________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

14 rows selected.

This time, there’s a SORT operation. even if I’m still reading with INDEX FULL SCAN.

NLS_SORT

The reason is that, by setting the ‘French’ language, I’ve also set the French sort collating sequence.


SQL> select * from nls_session_parameters;
                 PARAMETER                           VALUE
__________________________ _______________________________
NLS_LANGUAGE               FRENCH
NLS_SORT                   FRENCH

And this is different from the BINARY one that I had when my language was ‘American’.

Actually, only a few languages follow the BINARY order of the ASCII table:


SQL>
  declare
   val varchar2(64);
  begin
    for i in (select VALUE from V$NLS_VALID_VALUES where PARAMETER='LANGUAGE') loop
    execute immediate 'alter session set nls_language='''||i.value||'''';
    select value into val from NLS_SESSION_PARAMETERS where PARAMETER='NLS_SORT';
    if val='BINARY' then dbms_output.put(i.value||' '); end if;
    end loop;
    dbms_output.put_line('');
  end;
/

AMERICAN JAPANESE KOREAN SIMPLIFIED CHINESE TRADITIONAL CHINESE ENGLISH HINDI TAMIL KANNADA TELUGU ORIYA MALAYALAM ASSAMESE GUJARATI MARATHI PUNJABI BANGLA MACEDONIAN LATIN SERBIAN IRISH

PL/SQL procedure successfully completed.

This is ok for real text but not for my primary key where ASCII order is ok. I can set the NLS_SORT=BINARY for my session, but that’s too wide as my problem is only with a column.

Or I can create an index for the French collation. Actually, this is what is used internally:


SQL> explain plan for select * from demo order by ID;
Explained.

SQL> select * from dbms_xplan.display(format=>'basic +projection');
                                                      PLAN_TABLE_OUTPUT
_______________________________________________________________________
Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) NLSSORT("DEMO"."ID",'nls_sort=''GENERIC_M''')[50],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

GENERIC_M is the sort collation for many European languages.

But that again, does not fit the scope of my problem as I don’t want to create an index for any possible NLS_SORT setting.

COLLATE

The good solution is to define the collation for my table column: this ID is a character string, but it is an ASCII character string which has nothing to do with my language. In 18c I can do that:


SQL> alter table demo modify ID collate binary;

Table altered.

The COLLATE is a SQL Standard syntax that exists in other databases, and it came to Oracle in 12cR2.

And that’s all:


SQL> explain plan for select * from demo order by ID;

Explained.

SQL> select * from dbms_xplan.display(format=>'basic +projection');

                                             PLAN_TABLE_OUTPUT
______________________________________________________________
Plan hash value: 1955576728

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  INDEX FULL SCAN | DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "DEMO"."ID"[VARCHAR2,5]

No SORT operation needed, whatever the language I set for my session.

Here is the DDL for my table:


SQL> ddl demo

  CREATE TABLE "SYS"."DEMO"
   (    "ID" VARCHAR2(5) COLLATE "BINARY",
         CONSTRAINT "DEMP_PK" PRIMARY KEY ("ID")
  USING INDEX  ENABLE
   )  DEFAULT COLLATION "USING_NLS_COMP" ;

My column explicitly follows the BINARY collation.

Extended Data Types

Now, all seems easy, but there’s a prerequisite:


SQL> show parameter max_string_size

NAME            TYPE   VALUE
--------------- ------ --------
max_string_size string EXTENDED

I have set my PDB to EXTENDED string size.

If I try the same in a PDB with the ‘old’ limit of 4000 bytes:


SQL> alter session set container=PDB1;

Session altered.

SQL> show parameter max_string_size

NAME            TYPE   VALUE
--------------- ------ --------
max_string_size string STANDARD

SQL> drop table demo;

Table dropped.

SQL> create table demo (ID varchar2(5) collate binary constraint demp_pk primary key);

create table demo (ID varchar2(5) collate binary constraint demp_pk primary key)
 *
ERROR at line 1:
ORA-43929: Collation cannot be specified if parameter MAX_STRING_SIZE=STANDARD is set.

This new feature is allowed only with the Extended Data Types introduced in 12c release 2.

ORDER BY COLLATE

Ok, let’s create the table with the default collation:


SQL> create table demo (ID constraint demp_pk primary key) as
  2  select cast(dbms_random.string('U',1)||to_char(rownum,'FM0999') as varchar2(5)) ID
  3  from xmltable('1 to 10');

Table created.

SQL> select * from dbms_xplan.display_cursor(format=>'basic +projection');

                                                   PLAN_TABLE_OUTPUT
____________________________________________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) NLSSORT("DEMO"."ID",'nls_sort=''FRENCH''')[50],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

As my NLS_SORT is ‘French’ there is a SORT operation.

But I can explicitly request a BINARY sort for this:


SQL> select * from demo order by ID collate binary;

      ID
________
D0003
H0002
L0009
N0008
P0010
Q0005
R0004
W0007
Y0001
Z0006

10 rows selected.

SQL> select * from dbms_xplan.display_cursor(format=>'basic +projection');

                                             PLAN_TABLE_OUTPUT
______________________________________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID collate binary

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) "DEMO"."ID" COLLATE "BINARY"[5],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

I have no idea why there is still a sort operation. I think that the INDEX FULL SCAN returns already the rows in binary order. And that should require additional sorting for the ORDER BY … COLLATE BINARY.

Cet article Oracle non-linguistic varchar2 columns to order by without sorting est apparu en premier sur Blog dbi services.

↧

Power BI Report Server – URL reservation Warning – Delegation blocked

June 16, 2020, 11:47 pm

≫ Next: Documentum – Change/Rename of LDAP Config object

≪ Previous: Oracle non-linguistic varchar2 columns to order by without sorting

Introduction

Recently I experienced some strange behavior on Power BI Report Server. My customer reported some failure when authenticating on shared data source using Kerberos delegation. On the test environment everything was working as expected and the delegation was working fine. Where as on the development server the delegation was not working at all. We had as well a comparable behavior on the production environment, but only sporadically. The same data source worked fine for one user and suddenly no more for another one, or for the same user one data source connection was accessible but another one was failing with the well known error:

Investigation

It was clear that it was a Kerberos delegation issue at first glance, therefore the first things you look at are the settings of your service account:
– Is “Trust for delegation” set
– If you use constraint delegation, check if you address the right services

After that, we double checked that the SPN’s were still correct. Everything was fine and set as it must be.

The next step was to check the configuration of the Power BI Report Server. So opening the Power BI Report Server Configuration Manager, I went through all the configuration settings here again everything seemed to be correct.

Having some experiences working with Reporting Services and now with Power BI Report Server, I knew that playing and changing with the URL settings of the Web Server and the Web portal might corrupt the rsreportserver.config file, especially if you use https configuration. If you don’t understand what I’m talking about, you will sooner or later, if you have to install and configure this application often.
Bingo, it was the first issue. There was still entries in the xml nodes that were no more valid.

In that case my way to go is to return to the Power BI Report Server Configfuration Manager et reset the URL configurations to a default ones, deleting the ones existing in the web service and the web portal and apply the changes. After that make sure that in rsreportserver.config your tag is limited to the following example. If not, stop first your Power BI Report Server service, make first a backup of your file, than clean it and save it.

<URLReservations>
	<Application>
		<Name>ReportServerWebService</Name>
		<VirtualDirectory>ReportServer</VirtualDirectory>
		<URLs>
			<URL>
				<UrlString>http://+:80</UrlString>
				<AccountSid>S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663</AccountSid>
				<AccountName>NT SERVICE\PowerBIReportServer</AccountName>
			</URL>
		</URLs>
	</Application>
	<Application>
		<Name>ReportServerWebApp</Name>
		<VirtualDirectory>Reports</VirtualDirectory>
		<URLs>
			<URL>
				<UrlString>http://+:80</UrlString>
				<AccountSid>S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663</AccountSid>
				<AccountName>NT SERVICE\PowerBIReportServer</AccountName>
			</URL>
		</URLs>
	</Application>
</URLReservations>

Make sure as well that at the end of the file you do not have a tag . If you have one, delete the whole tag from to the end of and save you file again.

…of course restart your Power BI Report Server service after that

Now that my configuration file was clean, I tried again to set up again my HTTPS configuration for my web service and web portal the configuration manager tool. When you do it, please look carefully at the scrolling messages at the bottom of the configuration manager to see if it the settings apply successfully. For me it was full of green lights and the end message as well.

But, do not miss to read carefully the warnings. Right at the beginning one message was warning that the URL could not be reserved, but the following one said the URL has been removed and the next one that it was reserved successfully. So why to worry!!?? Just because my issue with Kerberos was still not solved.

URL Reservation list

Therefore I decided to follow this strange warning and search information about reserved URL.
I found out the following command to execute to see what were the URL’s reserved on my server and discover all that were reserved by the user NT SERVICE\PowerBIReportServer.

netsh http show urlacl

Then again I reset my URL configurations and clean my rsreportserver.config file once more

Removing URL Reservation

This time I left my Power BI Report Server service stopped and removed manually all the reserved URL linked to the user NT SERVICE\PowerBIReportServer. You should find always 4 URL’s, one for the Web service with the folder you used (in mase case …/ReportServer), one for the Web Portal with the folder you defined (in…/Reports), another folder named /PowerBI and the last one /wopi.

netsh http delete urlacl https://yourserver.yourdomain.com:443/ReportServer/
netsh http delete urlacl https://yourserver.yourdomain.com:443/Reports/
netsh http delete urlacl https://yourserver.yourdomain.com:443/PowerBI/
netsh http delete urlacl https://yourserver.yourdomain.com:443/wopi/

You can remove all the URL leaving on the leave only the one
With +80 and +8083 in the URL.

Then start your configuration manager again et make the URL configuration looking at the scrolling messages at the bottom. Normally no warnings will be displayed anymore.

… and finally

Well, when I was finished, I tested again my data sources connections and by magic all worked perfectly and the delegation was successfully done.
I hope this post will help you and save you time, it took me some hours to figure out that the URL reservation was causing delegation issues.

Cet article Power BI Report Server – URL reservation Warning – Delegation blocked est apparu en premier sur Blog dbi services.

↧

Documentum – Change/Rename of LDAP Config object

June 17, 2020, 12:04 pm

≫ Next: Azure Migrate: how to assess on-premises servers for a future Azure migration

≪ Previous: Power BI Report Server – URL reservation Warning – Delegation blocked

Using an LDAP Server with Documentum (or any ECM for that matter) is pretty common to avoid managing users locally. In this blog, I wanted to talk about something that isn’t very common and that is the change of an LDAP Server. I’m not talking about just changing the server where your LDAP is hosted but rather changing the full LDAP Server Config inside Documentum (including its name). This is probably something that you will not do very often but I had this use case before so I thought it would be interesting to share.

So the use case I had was the following one: during a migration (including an upgrade in the process) from Virtual Machines to Kubernetes pods of dozens of environments, I had to automate the setup & management of the LDAP Server as well as normalize the configuration (name & other parameters) according to certain characteristics. The source and target LDAP Server was the same (a_application_type: netscape) so it was really just a matter of automation and conventions (if the target isn’t the same, it wouldn’t change anything for this blog). As you know if you already came across one of my previous blogs, DA is doing some magic, which prevents you to really manage the LDAP Server in the same way between DA and automation.

Therefore, the first part of the use case could have just been to change the LDAP Config object “certdb_location” parameter from the default “ldapcertdb_loc” to another “dm_location” which is using a “file” “path_type” and not a “directory” one. If you understood that last sentence, well done! If you didn’t, don’t worry, I would just suggest you to go and read the blog I linked above. It shows why you would need to use a dedicated dm_location with a path_type set to a file (that is the SSL Certificate of the LDAP Server) in order to automate the setup of an LDAP Config object. Of course, this is only needed if you are using SSL communications with the LDAP Server (i.e. LDAPS). Therefore, that would have been the simple part that is easily understandable, however it is another story if you also need to normalize the LDAP Config object setup in the process. Changing parameters of the LDAP Config object is easy and shouldn’t bring any issues or interruptions, but it’s another story if you need to change the name of the LDAP object… If the field is greyed out in DA, it’s for a reason ;).

The dozens of environments were all using the exact same backend LDAP Server but there were still some small differences in the setup like different names of course, different attributes mapped here and there, things like that. To be sure everything would be aligned on the target, I took the simplest road: removing all the source LDAP Config objects and recreating them in our CI/CD (JenkinsX pipelines, Ansible playbooks, aso…). Before removing anything, make sure that you understand what it means for your specific setup, it might not always be that straightforward! Alright, let’s check the status of the LDAP at the beginning:

[dmadmin@stg_cs ~]$ export docbase=REPO1
[dmadmin@stg_cs ~]$
[dmadmin@stg_cs ~]$ iapi ${docbase} -U${USER} -Pxxx << EOC
> ?,c,select r_object_id, object_name from dm_ldap_config;
> ?,c,select ldap_config_id from dm_server_config;
> ?,c,select user_source, count(*), user_login_domain from dm_user group by user_login_domain, user_source;
> EOC


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2016
        All rights reserved.
        Client Library Release 7.3.0040.0025


Connecting to Server using docbase REPO1
[DM_SESSION_I_SESSION_START]info:  "Session 010f123451ad5e0f started for user dmadmin."


Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Oracle
Session id is s0
API> 
r_object_id       object_name
----------------  ----------------
080f1234500edaf3  Source_LDAP_Name
(1 row affected)

API> 
ldap_config_id
----------------
080f1234500edaf3
080f1234500edaf3
080f1234500edaf3
(3 rows affected)

API> 
user_source       count(*)                user_login_domain
----------------  ----------------------  -----------------
LDAP                               55075  Source_LDAP_Name
inline password                      307  
                                    1466  
(3 rows affected)

API> Bye
[dmadmin@stg_cs ~]$

As you can see above, there is currently one LDAP Config object that is used by around 55k users. There are three rows for the second query because it’s a HA Repository with 3 Content Servers. The first thing I did was therefore to remove the LDAP Config object and change the different references to prepare for the new name of the object (you can put the two last update commands in just one query, it would be better but I split them here for this example):

[dmadmin@stg_cs ~]$ export old_ldap_name=Source_LDAP_Name
[dmadmin@stg_cs ~]$ export new_ldap_name=Target_LDAP_Name
[dmadmin@stg_cs ~]$
[dmadmin@stg_cs ~]$ iapi ${docbase} -U${USER} -Pxxx << EOC
> ?,c,delete dm_ldap_config objects where object_name='${old_ldap_name}';
> ?,c,update dm_server_config object set ldap_config_id='0000000000000000';
> ?,c,update dm_user object set user_login_domain='${new_ldap_name}' where user_login_domain='${old_ldap_name}' and user_source='LDAP';
> ?,c,update dm_user object set user_global_unique_id='' where user_login_domain='${new_ldap_name}' and user_source='LDAP';
> EOC


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2016
        All rights reserved.
        Client Library Release 7.3.0040.0025


Connecting to Server using docbase REPO1
[DM_SESSION_I_SESSION_START]info:  "Session 010f123451ad5e11 started for user dmadmin."


Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Oracle
Session id is s0
API> 
objects_deleted
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your DELETE statement."


API> 
objects_updated
---------------
              3
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "3 objects were affected by your UPDATE statement."


API> 
objects_updated
---------------
          55075
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "55075 objects were affected by your UPDATE statement."


API> 
objects_updated
---------------
          55075
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "55075 objects were affected by your UPDATE statement."


API> Bye
[dmadmin@stg_cs ~]$

At the moment, the new LDAP Config object wasn’t created yet but to avoid any disturbance, I already changed the “user_login_domain” from the old name to the new one. Another point to note is the reset of the “user_global_unique_id” parameter to an empty value. As soon as users are synchronized using an LDAP, for example, they will be assigned with a unique identity. For an LDAP account, the id should be something like “###dm_ldap_config.object_name###:###random_numbers###” (E.g.: Source_LDAP_Name:078b1d9f-f4c35cd2-cad867c1-1f4a7872) while for an inline account it should be more “###dm_docbase_config.object_name###:###dm_user.user_name###” (E.g.: REPO1:Patou Morgan). If you just change the LDAP Config object name and update the “user_login_domain” of the user without anything else, in the end, the users won’t be able to login and if you try to execute the LDAP Sync, you should see some warnings showing that your users haven’t been synched (skipped). This is because the user already exist with a different identity. The message on the LDAP Sync would be something like “WARNING: A User with same user_name (Patou Morgan) exists in the docbase with a different identity”. The identity is generated by Documentum and written in the dm_user object under the “user_global_unique_id” attribute. Setting this to an empty value will allow Documentum to generate a new identity so that the user can work with the new LDAP.

Verification after the deletion:

[dmadmin@stg_cs ~]$ iapi ${docbase} -U${USER} -Pxxx << EOC
> ?,c,select r_object_id, object_name from dm_ldap_config;
> ?,c,select ldap_config_id from dm_server_config;
> ?,c,select user_source, count(*), user_login_domain from dm_user group by user_login_domain, user_source;
> EOC


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2016
        All rights reserved.
        Client Library Release 7.3.0040.0025


Connecting to Server using docbase REPO1
[DM_SESSION_I_SESSION_START]info:  "Session 010f123451ad5e12 started for user dmadmin."


Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Oracle
Session id is s0
API> 
r_object_id       object_name
----------------  ----------------
(0 rows affected)

API> 
ldap_config_id
----------------
0000000000000000
0000000000000000
0000000000000000
(3 rows affected)

API> 
user_source       count(*)                user_login_domain
----------------  ----------------------  -----------------
LDAP                               55075  Target_LDAP_Name
inline password                      307  
                                    1466  
(3 rows affected)

API> Bye
[dmadmin@stg_cs ~]$

Once that was done, I simply re-created a new LDAP Config object as described in the blog I linked at the beginning of this post and using the new name “Target_LDAP_Name” (including the re-encryption of the password). Then running the LDAP Sync to make sure everything is working properly and checking the user_global_unique_id of some LDAP users to make sure it has been regenerated properly. It should be something like “Source_LDAP_Name:078b1d9f-f4c35cd2-cad867c1-1f4a7872” before and then updated to something like “Target_LDAP_Name:2d7c130d-84a47969-926907fa-f1649678” after the change of LDAP Config object name (+ execution of LDAP Sync) – obviously the names and the IDs will change in your case.

Cet article Documentum – Change/Rename of LDAP Config object est apparu en premier sur Blog dbi services.

↧

Azure Migrate: how to assess on-premises servers for a future Azure migration

June 18, 2020, 12:14 am

≫ Next: Some myths about PostgreSQL vs. Oracle

≪ Previous: Documentum – Change/Rename of LDAP Config object

Azure Migrate provides a set of tools to assess and migrate on-premise infrastructure like Hyper-V or VMware virtual machines, physical servers as well as SQL and other Databases, web application…
The Azure Migrate overview page resumes the different key scenario available.

Lets have a look how easily you can assess your Hyper-V virtual machines for a future migration to Azure via this migration tool.
You need to add a tool to discover your VMs but before that you need to create a specific Resource Group for this Azure Migration, give a name to this project and a preferred location for your future Azure Virtual Machines.

Once done you need to select the assessment tool followed by the migration tool. Personally, I choose the Azure ones but others ISV (independent software vendor) offerings are available.
At the end of this process you can review your selection and if you are satisfied with your selection click on “Add tools” button:

The next step is to discover your on-premise servers, here my Virtual machines. To do it you have two possibilities:

deploy an Azure migrate appliance, this appliance stays connected to Azure Migrate and will continuously discover your on-premises machines.
use a CSV file to import an inventory of the VMs you would like to quickly assess to have a cost and also a compatibility view. If you need an accurate assessment you need to use the appliance, same if you want to migrate the assessed servers.

For my first text I used a CSV file containing 3 Virtual machines. Once imported you can see in the Azure migrate Server Assessment my 3 discovered servers.

If you click on discovered servers, you will have an overview of the server and you can see that they have been imported and not discovered by an appliance:

You can now create a group which is a collection of machines you would like to assess and migrate together. In this example I took two machines from my imported ones:

Now that your group is created you can run an assessment for this group by:

selecting Imported machines
give a name to your assessment
choose the sizing criteria for your assessment:
- Performance-based: based on performance-data values specified
- As on-premises: based on on-premises sizing
select or create a new group
click the “Create assessment” button:

After some minutes the assessment is available, you are able to visualize your assessment with 3 parts:

Azure Readiness: tell you if your servers are ready for migration to Azure or if not shows problems & possible remediation
Monthly cost estimate: sum based on size recommendations for Azure VMs (compute) and associated VMs storage
Storage – Monthly cost estimate: total storage cost split by storage type

You can have more information by clicking on each part of the assessment.
On Azure Readiness you can see the type of Azure Virtual Machine which have been selected during the assessment, the number of disks per on-premises VM and their size:

If you click on a specific on-premise machine you will have a more detailed view:

a green light for the migration + the Azure VM size + the estimated monthly coast divided between compute and storage
the Operating System used on your VM
A compute resumes with the number of cores and the size of the RAM
the selected storage with the number of disks, the target disk type & size

As written in the beginning of this blog I cannot migrate my on-premises Virtual Machines after this assessment as I didn’t use the Azure Migrate appliance to discover my on-premises servers. Nevertheless, it’s a good starting point to have a better feeling of the cost if you want to migrate to Azure.
I will definitively deploy an appliance on my next blog post to have an accurate assessment and follow it by a migration.

Cet article Azure Migrate: how to assess on-premises servers for a future Azure migration est apparu en premier sur Blog dbi services.

↧

Some myths about PostgreSQL vs. Oracle

June 24, 2020, 10:42 am

≫ Next: The Oracle ACE program ♠ what it is not ♠

≪ Previous: Azure Migrate: how to assess on-premises servers for a future Azure migration

By Franck Pachot

.
I originally wrote this as a comment on the following post that you may find on internet:
https://www.2ndquadrant.com/en/blog/oracle-to-postgresql-reasons-to-migrate/
but my comment was not published (many links in it… I suppose it has been flagged as spam?) so I put it there.

You should never take any decision on what you read on the internet without verifying. It is totally valid to consider a move to Open Source databases, but doing it without good understanding is a risk for your migration project success.

In italics are the quotes from the article.

Kirk,
As you do a comparison and link to a list of PostgreSQL features, let me refine the name and description of the Oracle features you compare to, so that people can find them and do a fair comparison. I’m afraid they may not recognize the names and descriptions you provide, at least in current versions. As an example, nobody will get search hits for “Federation”, or “plSQL”, or “HTML DB”… in the Oracle documentation but they will find “Oracle Gateway”, “PL/SQL”, “APEX”…

Federation vs. Foreign Data Wrappers

There is no feature called “Federation”.
The closest from your description is Database links and Heterogeneous Services through Database Gateway. They go further than FDW in many points. But anyway, I would never use that for ETL. ETL needs optimized bulk loads and there are other features for that (like External Tables to read files, and direct-path inserts to fast load). If your goal is to federate and distribute some small reference tables, then Materialized Views is the feature you may look for.
https://docs.oracle.com/en/database/oracle/oracle-database/20/heter/introduction.html#GUID-EC402025-0CC0-401F-AF93-888B8A3089FE

plSQL vs. everything else

“Oracle has a built-in programming language called plSQL.”
PL/SQL is more than that. It is compiled (to pcode or native), manages dependencies (tracks dependencies on schema objects), optimized for data access (UDF can even be compiled to run within the SQL engine), can be multithreaded (Parallel Execution). That’s different from PL/pgSQL which is interpreted at execution time. You mention languages as “as plug-ins” and for this, there are other ways to run different languages (external procedures, OJCM, External Table preprocessor,…) but when it comes to performance, transaction control, dependency tracking,… that’s PL/SQL.
https://docs.oracle.com/en/database/oracle/oracle-database/20/lnpls/overview.html#GUID-17166AA4-14DC-48A6-BE92-3FC758DAA940

Application programming

Providing an “API to communicate with the database” is not about open source as the main goal is: encapsulation and hide implementation details. In order to access internal structures, which is what you mention, Oracle provides relational views (known as V$ views) accessible with the most appropriate API for a relational database: SQL
https://docs.oracle.com/en/database/oracle/oracle-database/20/refrn/dynamic-performance-views.html#GUID-8C5690B0-DE10-4460-86DF-80111869CF4C

Internationalization and Localization

The “globalization toolkit” is only one part of the globalization features. You can also use any “any character encoding, collation and code page” but not relying on the OS implementation of it makes it cross-platform compatible and OS upgrade compatible (see https://wiki.postgresql.org/wiki/Locale_data_changes)
https://docs.oracle.com/en/database/oracle/oracle-database/20/nlspg/overview-of-globalization-support.html#GUID-6DD587EE-6686-4802-9C08-124B495978D5

Web Development

“Oracle acknowledges the existence of HTML through HTML DB. PostgreSQL natively supports JSON, XML and plugs in Javascript”. HTML DB can be found in paper books, but the name is “APEX” since 2006. And it is not (only) about HTML, JSON, or XML but is a low-code Rapid Application Development with no equivalent for other databases.
Support for the structures and languages you mention are all there. The latest trend being JSON: https://docs.oracle.com/en/database/oracle/oracle-database/20/adjsn/index.html

Authentication

“Oracle has a built-in authentication system.”
Yes, to be platform-independent, and has many other External Authentication: https://docs.oracle.com/en/database/oracle/oracle-database/20/dbseg/configuring-authentication.html#GUID-BF8E5E84-FE7E-449C-8081-755BAA4CF8DB

Extensibility

“Oracle has a plug-in system”. I don’t know what you are referring to. Oracle is multi-platform proprietary software. Commercial, which means with vendor supported. There are a lot of APIs for extensions, but the vendor must have to control what runs in the engine in order to provide support.

Read Scalability

“PostgreSQL can create a virtually unlimited read cluster”. Oracle has active/active cluster (called RAC) and read replicas (called Active Data Guard). For horizontal scalability, you use the same as for vertical (Parallel Execution) across multiple nodes (in sync, with instance affinity on partitions,…)
https://docs.oracle.com/en/database/oracle/oracle-database/20/vldbg/parallel-exec-intro.html#GUID-F9A83EDB-42AD-4638-9A2E-F66FE09F2B43

Cost

“they don’t mind charging you again for every single instance.”
No, that’s wrong, license metrics are on processors (CPU) or users (NUP). You run as many instances as you want on your licensed servers for your licensed users: https://www.oracle.com/a/ocom/docs/corporate/oracle-software-licensing-basics.pdf
“jamming everything into a single instance just to reduce costs”.
No, database consolidation is recommended to scale the management of multiple databases, but not for licensing costs. If you go there, there are a lot of features to allow isolation and data movement in consolidated databases: Multitenant, Resource Manager, Online Relocate, Lockdown Profiles,…

Performance

“differentiate the tuning parameters for your warehouse to OLTP to reporting to the data lake”: I already mentioned the point about read replicas and about multiple instances in a server. But with oracle, all the parameters I want to set different for OLTP or reporting do not require another instance. They can be set at session or PDB level. As Oracle does not need the filesystem buffer cache, there’s no need to separate on different servers to avoid noisy neighbours.

I hope this helps to look further at the features. There are many reasons to migrate and the main one is the will to move from a commercial model (with license and support) to an open-source one (start with low cost, help from community). But decision must be made on facts and not rumours.

Franck.

Cet article Some myths about PostgreSQL vs. Oracle est apparu en premier sur Blog dbi services.

↧

The Oracle ACE program ♠ what it is not ♠

June 24, 2020, 12:55 pm

≫ Next: Attaching your own CentOS 7 yum repository to AWS SSM

≪ Previous: Some myths about PostgreSQL vs. Oracle

By Franck Pachot

.
I had a few questions about the Oracle ACE program recently and I thought about putting some answers there. Of course, that’s only my point of view, there’s an official web page: https://www.oracle.com/technetwork/community/oracle-ace/index.html

The program is flexible and open, with a large diversity of people, technologies, contributions, levels,… Then rather than explaining what it is, which would be limiting, I’ll rather tell you… what it is not.

It is not a graded evaluation

You may have heard about “ACE points”. When I entered the ACE program it was running for a long time with a subjective evaluation on the contributions in the Oracle community. Then it became more structured with a clear list of activities that are recognized, an application (APEX of course) to fill-in the contributions, and points to rate them. But the goal is not to get the highest score. The reason for this point system is to be sure that all contributions are accounted to determine your level of contribution.

Typically, you enter as an ACE Associate by listing a few blog posts, or presentations you did. Then you contribute more, maybe writing an article, giving more presentations, or being active on Oracle forums. You list all that and after a while, you may reach a number of points where they will evaluate an upgrade to the ACE level. Do not see this “more contributions” as a constraint. The goal of the program is to open new doors for contributing further. Being in the ACE program will help you to be selected for conferences, to meet Product Managers from Oracle, to know more people in the user community,… And you will realize that there are many more contributions that can count. You may realize that public-facing activities are not your preference. But at the same time, you will discuss with some product managers and realize that some code contribution, SR’s or Enhancement Requests are also recognized contributions. Some people do not want to talk at conferences but volunteer in their User Groups or organize meetups. That counts as well, and the idea raises when meeting people (physically or virtually). You may write a few chapters for a book on a technology you like with people you appreciate. You may meet people contributing to the Oracle Dev Gym. You may also realize that you like public-facing sharing and try to produce, in addition to presentations, some videos or podcasts. All that is accounted thanks to the points system.

Depending on your motivation, and the time you have, you may go further, to the ACE Director level. Or not, because you don’t have to, but I will come back on this later. I was not in the program for a long time when the “points” system was introduced, so I may be wrong in my opinion. But my feeling is that it was easier to enter the program when going physically to the main conferences and drinking a beer with influential people. Some contributions were highly visible (like speaking on mainstream technologies) and some were not. If you did not travel and do not drink beer, entering the program to high levels were probably harder. I think that the “points” system is fairer, bringing equality and diversity. And that the additional time to enter the contributions worths it.

It is not a technichal certification

The ACE program is not a technical validation of your knowledge like the Oracle Educations exams are. You don’t even get “points” for being Oracle Certified Master. Of course, valuable contributions are often based on technical skills. But all conferences miss good sessions on soft skills and sessions on basics for beginners. Of course, it is cool if you go deep into the internals of an Oracle Product, but you may have a lot to share even if you are a junior in this technology. Just share what you learned.

It is not a marketing campaign

You don’t need to be an expert on the product, but you cannot give a valuable contribution if you are not using and appreciating the product. This is still a tech community that has its roots in the Oracle Technology Network. You share in the spirit of this community and user groups: not marketing but tech exchanges. You are there to help the users and the product, and the communication around those. You are not there to sell any product, and you will realize the number of people contributing about free products. Oracle XE, SQL Developer, MySQL, Cloud Free Tier, Oracle Linux,… those are valuable contributions.

Of course, the ACE program has some budget that comes from marketing. And yes, you get more “points” when contributing to “cloud” products because that’s where all priorities are at Oracle Corp, and this includes the ACE program. But do not take it like “I must talk about the cloud”. Just take it as “cool, I got more point because I contributed to their flagship product”. If you contribute for points only, you are wrong. You will be tired of this quickly. Just contribute on what you like, and points will come to recognize what you did and encourage you to go further.

There is no compulsory content

I mentioned that you can contribute on any Oracle products, paid or free, and there are a lot. You don’t need to talk about the database. You don’t need to talk about the cloud. You don’t need to talk about expensive options. The ACE program is flexible and this is what allows diversity. Depending on your country, and depending on your job, or simply on you motivation, you may have nothing to share about some products that are common elsewhere. Some consultant have all their customers on Exadata, and have a lot to share about it. Others have all their databases in Standard Edition and their contributions are welcome as well.

I’ll be clear if you have some doubts: I have never been asked to talk or write about anything. All are my choices. And I have never been asked to change or remove some published content. And my subjects also cover problems and bugs, because I consider that it helps to share them as well. Actually, I’ve deleted a tweet only two times because of feedback from others. And the feedback was not to ask me to take it down but just to mention that one word may sound a little harsh, And I checked my tweet, and I agreed my wording was not ok, and then preferred not to leave something that could be interpreted this way. Two times, and it was my choice, and I’m at 20K tweets.

It is not a perpetual prize

The ACE levels I’ve mentioned (ACE Associate, ACE, ACE Director) are not Nobel prizes and are not granted for life. They show the level of current and recent contributions. If you do not contribute anymore, you will leave the program as an ACE Alumni. And that’s totally normal. The ACE program is there to recognize your contributions and helps you with those. You may change you job and work on different technology, lose your motivation, or simply don’t have time for this, and that’s ok. Or simply don’t want to be an ACE. Then it is simple: you don’t enter enough contributions in the “points” application and at next evaluation (in June usually) you will be ACE Alumni.

I have an anecdote about “don’t want to be an ACE”. The first nomination I entered, I did it for someone who contributed in his way (no public-facing but organizing awesome events). And I did it without telling him. I was excited to see his surprise, and he was accepted. But he told me that he didn’t want to be an ACE Associate. Sure I was surprised, but that’s perfectly ok. There’s no obligation about anything. Now, If I want to nominate someone I ask before

I am an ACE Director, I have a lot of points, but I do not write or do anything with points in mind. The “points” are just a consequence of me writing blog posts and presenting at conferences. I contributed only on those two channels in 2019-2020. The year before I had more point, with some articles, and SR (bugs discussed with the product managers). In the coming year, I may try something else, but again not thinking about points. I work also on non-Oracle technologies, even competing, because that’s my job. But for sure I’ll continue to contribute a lot on Oracle Product. Because I like to work with them, with Oracle Customers, with Oracle employees,… And then there are good chances that I’ll stay in the program and at the same level. But please, understand that you don’t need to do the same. By the location where I am (with many meetups and conferences organized), by the company I work for (where knowledge sharing is greatly encouraged), and the time I can find (kids becoming more and more autonomous), I’m able to do that. That’s what I like in this program: people do what they can do and like to do, without any obligation, and when this meets some ACE levels of recognized contributions, the ACE program encourages and helps to continue.

Cet article The Oracle ACE program ♠ what it is not ♠ est apparu en premier sur Blog dbi services.

↧

Attaching your own CentOS 7 yum repository to AWS SSM

June 25, 2020, 4:02 am

≫ Next: Oracle Cloud basics for begginers

≪ Previous: The Oracle ACE program ♠ what it is not ♠

From some blogs I’ve written in the past you might already know that we are using AWS SSM to patch and maintain the complete EC2 and onprem instances at one of our customers. The previous posts about that topic are here:

While that in general is working fine and fully automated we ran into an issue lately which forced us to create our own CentOS 7 repositories and use them with SSM to apply the patches to the CentOS machines.

To describe the issue: We have two patch baselines per operating system. One for all development and test systems that applies all patches that are released up until the date the patch baseline is running. Then we have second one for the production systems with an approval delay of 14 days. As we run production patching 2 weeks after we patched the development and test systems that should guarantee that we get the same patches applied to production. And exactly here is the issue: “if a Linux repository doesn’t provide release date information for packages, Systems Manager uses the build time of the package as the auto-approval delay for Amazon Linux, Amazon Linux 2, RHEL, and CentOS. If the system isn’t able to find the build time of the package, Systems Manager treats the auto-approval delay as having a value of zero.”. That basically means: As you never know when CentOS will release their patches, which are based on the RedHat sources, you can never be sure that you get the same patches applied to production as they were applied 14 days before to development and test. Lets do an example: Our patching for development and test happened the 10th of April. The kernel package v3.10.0-1127 was released for CentOS on April 27th and was therefore not applied to the development and test systems. When production patching happened two weeks later that kernel package was available but also satisfied our auto approval rule of 14 days. So we basically had a patch installed on the production which never made it to the development and test systems. This is why we decided to go for our own repositories so we can decide when the repositories are synced.

Setting up a local yum repository is quite easy and you can find plenty of howtos in the internet, so here is just a summary without much explanation. We deployed a new CentOS 7 EC2 instance, then installed a webserver and the epel repository:

[centos@ip-10-47-99-158 ~]$ sudo yum install epel-release nginx -y
[centos@ip-10-47-99-158 ~]$ sudo systemctl start nginx
[centos@ip-10-47-99-158 ~]$ sudo systemctl enable nginx
[centos@ip-10-47-99-158 ~]$ sudo systemctl status nginx

As yum gets the packages over http or https adjust the firewall rules:

[centos@ip-10-47-99-158 ~]$ sudo systemctl start firewalld
[centos@ip-10-47-99-158 ~]$ sudo systemctl enable firewalld
[centos@ip-10-47-99-158 ~]$ sudo firewall-cmd --zone=public --permanent --add-service=http
[centos@ip-10-47-99-158 ~]$ sudo firewall-cmd --zone=public --permanent --add-service=https
[centos@ip-10-47-99-158 ~]$ sudo firewall-cmd --reload

Update the complete system and install the yum utilities and the createrepo packages:

[centos@ip-10-47-99-158 ~]$ sudo yum update -y
[centos@ip-10-47-99-158 ~]$ sudo yum install createrepo  yum-utils -y

Prepare the directory structure and synchronize the repositories:

[centos@ip-10-47-99-158 ~]$ sudo mkdir -p /var/www/html/repos
[centos@ip-10-47-99-158 ~]$ sudo chmod -R 755 /var/www/html/repos
[centos@ip-10-47-99-158 ~]$ sudo reposync -g -l -d -m --repoid=base --newest-only --download-metadata --download_path=/var/www/html/repos/centos-7/7/
[centos@ip-10-47-99-158 ~]$ sudo reposync -l -d -m --repoid=extras --newest-only --download-metadata --download_path=/var/www/html/repos/centos-7/7/
[centos@ip-10-47-99-158 ~]$ sudo reposync -l -d -m --repoid=updates --newest-only --download-metadata --download_path=/var/www/html/repos/centos-7/7/
[centos@ip-10-47-99-158 ~]$ sudo reposync -l -d -m --repoid=epel --newest-only --download-metadata --download_path=/var/www/html/repos/centos-7/7/

Create the repositories from what was synced above:

[centos@ip-10-47-99-158 ~]$ sudo createrepo /var/www/html/repos/centos-7/7/base
[centos@ip-10-47-99-158 ~]$ sudo createrepo /var/www/html/repos/centos-7/7/extras
[centos@ip-10-47-99-158 ~]$ sudo createrepo /var/www/html/repos/centos-7/7/updates
[centos@ip-10-47-99-158 ~]$ sudo createrepo /var/www/html/repos/centos-7/7/epel

… and set the selinux context:

[centos@ip-10-47-99-158 ~]$ sudo semanage fcontext -a -t httpd_sys_content_t "/var/www/html/repos(/.*)?"
[centos@ip-10-47-99-158 ~]$ sudo restorecon -Rv /var/www/html/repos

Configure nginx to point to the repositories:

[centos@ip-10-47-99-158 ~]$ sudo vi /etc/nginx/conf.d/repos.conf 
## add the folling section
server {
        listen   80;
        server_name  10.47.99.158;	
        root   /var/www/html/repos/;
        location / {
                index  index.php index.html index.htm;
                autoindex on;	#enable listing of directory index
        }
}

… and restart the webserver:

[centos@ip-10-47-99-158 ~]$ sudo systemctl restart nginx

From now on you should see the directory structure when you point your browser to the IP of the EC2 instance:

To regularly synchronize the repositories depending on your requirements create a small script that does the job and schedule that with cron, e.g.:

#!/bin/bash
LOCAL_REPOS="base extras updates epel"
##a loop to update repos one at a time
for REPO in ${LOCAL_REPOS}; do
    if [ "$REPO" = "base" ]; then
        reposync -g -l -d -m --repoid=$REPO --newest-only --download-metadata --download_path=/var/www/html/repos/centos-7/7/
    else
        reposync -l -d -m --repoid=extras --newest-only --download-metadata --download_path=/var/www/html/repos/centos-7/7/
    fi
    createrepo /var/www/html/repos/centos-7/7/$REPO
    semanage fcontext -a -t httpd_sys_content_t "/var/www/html/repos(/.*)?"
    restorecon -Rv /var/www/html/repos
done

Test the repository from another CentOS 7 instance:

Using username "centos".
Authenticating with public key "imported-openssh-key"
[centos@ip-10-47-98-80 ~]$ sudo bash
[root@ip-10-47-98-80 centos]$ cd /etc/yum.repos.d/
[root@ip-10-47-98-80 yum.repos.d]$ ls
CentOS-Base.repo  CentOS-CR.repo  CentOS-Debuginfo.repo  CentOS-fasttrack.repo  CentOS-Media.repo  CentOS-Sources.repo  CentOS-Vault.repo
[root@ip-10-47-98-80 yum.repos.d]$ rm -f *
[root@ip-10-47-98-80 yum.repos.d]$ ls -la
total 12
drwxr-xr-x.  2 root root    6 Jun 25 06:39 .
drwxr-xr-x. 77 root root 8192 Jun 25 06:36 ..


[root@ip-10-47-98-80 yum.repos.d]$ cat local-centos.repo
[local]
name=CentOS Base
baseurl=http://10.47.99.158/centos-7/7/base/
gpgcheck=0
enabled=1

[extras]
name=CentOS Extras
baseurl=http://10.47.99.158/centos-7/7/extras/
gpgcheck=0
enabled=1

[updates]
name=CentOS Updates
baseurl=http://10.47.99.158/centos-7/7/updates/
gpgcheck=0

[epel]
name=CentOS Updates
baseurl=http://10.47.99.158/centos-7/7/epel/
gpgcheck=0
[root@ip-10-47-98-80 yum.repos.d]#


[root@ip-10-47-98-80 yum.repos.d]$ yum search wget
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
======================================================================= Name Exactly Matched: wget =======================================================================
wget.x86_64 : A utility for retrieving files using the HTTP or FTP protocols

  Name and summary matches only, use "search all" for everything.
[root@ip-10-47-98-80 yum.repos.d]#

… and you’re done from a repository perspective.

Now it is time to tell SSM to use the local repositories with your patch baseline. If you don’t know how SSM works or how you can apply patches using SSM check the previous post.

All you need to do is to adjust the patch baseline to include your repositories as “Patch sources”:

Schedule your patching and then check the logs. You should see that SSM is now using the local repositories:

...
u'sources': [{u'configuration': u'[local]\nname=CentOS Base\nbaseurl=http://10.47.99.158/centos-7/7/base/\ngpgcheck=0\nenabled=1', u'products': [u'*'], u'name': u'base'}, {u'configuration': u'[extras]\nname=CentOS Extras\nbaseurl=http://10.47.99.158/centos-7/7/extras/\ngpgcheck=0\nenabled=1', u'products': [u'*'], u'name': u'extras'}, {u'configuration': u'[updates]\nname=CentOS Updates\nbaseurl=http://10.47.99.158/centos-7/7/updates/\ngpgcheck=0', u'products': [u'*'], u'name': u'updates'}
...
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-Base.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-CR.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-Debuginfo.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-Media.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-Sources.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-Vault.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-fasttrack.repo
06/25/2020 10:00:28 root [INFO]: Moving file: CentOS-x86_64-kernel.repo
06/25/2020 10:00:28 root [INFO]: Executing lambda _create_custom_repos
06/25/2020 10:00:28 root [INFO]: Creating custom repo base
06/25/2020 10:00:28 root [INFO]: Creating custom repo extras
06/25/2020 10:00:28 root [INFO]: Creating custom repo updates
06/25/2020 10:00:28 root [INFO]: Creating custom repo epel
Loaded plugins: fastestmirror

That’s it. Doing it this way you have full control about which packages will be installed. The downside, of course, is, that you need to maintain your own copy of the repositories.

Cet article Attaching your own CentOS 7 yum repository to AWS SSM est apparu en premier sur Blog dbi services.

↧

Oracle Cloud basics for begginers

June 25, 2020, 7:26 am

≫ Next: Postgres Vision 2020 is still live

≪ Previous: Attaching your own CentOS 7 yum repository to AWS SSM

Introduction

Cloud, Cloud, Cloud. Everyone is talking about the Cloud but a lot of people are still in the fog with Cloud technologies. Let’s talk about basic features of the Oracle Cloud, called OCI for Oracle Cloud Infrastructure.

What is really OCI?

OCI is physically a lot of servers in datacenters all around the world. These servers are not very different from the servers you probably have in your own datacenter. Some of these servers are already in use by the customers, and some are immediately available for existing customers or new customers. Most of the customers will not use complete servers but part of them, thanks to the virtualization layer on OCI. A real server can hold several virtual servers, quite a lot actually. Oracle tells us that there is no overprovisionning on OCI: if you create your own server with 2 CPUs and 64GB of RAM, you’re pretty sure that these resources are available for you on the physical server, even if you don’t plan to use them at full throttle. If you need a complete physical server for yourself, it’s also possible, and it’s easy to provision just like a virtual machine.

What do I need to create a server in OCI?

OCI is actually available through a website, the OCI console, but you’ll have to buy Cloud credits to be able to create resources in this Cloud.

Two other options are available:
– ask your Oracle software provider for free trial Cloud credits for testing OCI
– create a free account and use only always-free resources (quite limitating)

When you’re connected to your brand new OCI account onto the console, just create a compute instance. A compute instance is a server for multi-purpose usage. Several options are available at server creation, like the number of CPUs, the amount of RAM, the size of the boot disk, and the OS that will come pre-installed. Provisionning a simple Linux server takes 2 minutes. Deleting a server is also a matter of minutes.

Can I go straight to server creation?

Not really. You cannot simply create a server, because you’ll need to put this server in a compartment, a kind or virtual container for your servers. So first step is to create a compartment. Compartments are fully isolated between them.

Then, you’ll need a private network (called Virtual Cloud Network or VCN) where to put your server. This private network should be created with care because it cannot overlap your on-premise network, especially if you plan to connect them (you surely need to). With network creation, other basic network components need to be also configured.

What are the basic network resources to configure?

First of all, all these resources are virtual resources in OCI. When configuring your network, you’ll also need at least one subnet from your VCN, a firewall (called security list), a router (route table) and a gateway for connecting this server (NAT gateway for outbound internet connexion or internet gateway for both inbound and outbound connexions).

Your OCI network will be linked to your on-premise network with IPSec VPN technology or FastConnect. This last option being a dedicated connexion to your existing infrastructure that does not go through internet.

So before creating your first server, you’ll need to define and configure all these network settings properly.

How to connect to this server?

If you don’t want to configure a VPN or a FastConnect link for now, you can associate your compute instance to an internet gateway to make it available from everywhere. Security is achieved with SSH keys: you provide your public key(s) on the OCI console for this server, and only you will be able to establish a connexion to your server. Later, a VPN or FastConnect configuration will let you reach all your OCI servers as if they were on your network.

What are the other services available?

If you’re thinking about OCI, it’s probably because you do not only need servers: you need Oracle databases. Actually, you don’t have to provision compute instances to install databases on it. You can directly provision databases, for various versions, Standard or Enterprise Edition, with you own license or without any license (the license fee will be billed as if it were an OCI resource – on a monthly basis). For sure, an underlying server will be provisionned, but you don’t have to create it as a separate task. If you need to connect later to this server, it’s possible as if it were a normal compute instance.

A key feature of OCI is what they call autonomous database: it’s a self-managed database that doesn’t give you access to the server or even the SYSDBA role on the database. You control this kind of DB through a dedicated interface (for loading data for example) and let Oracle automatically manage the classic DBA tasks, even those high-level. Autonomous database comes in two flavours: OLTP or Datawarehouse. Embedded autonomous engine will act differently.

Database services also come with automatic backup you can simply configure when creating the database (or after). Just define what kind of backup you need (mainly choose from various retentions and frequencies) and RMAN will automatically take care of your backups. Restore can be done directly through the OCI console.

Other services are also available, like load balancer or MySQL databases. Some services are free, some come at a cost.

How about the storage?

Multiple storage options are available for your servers depending on your needs:
– block storage: this is similar to LUNs on SAN. Choose the size at block storage creation and plug this storage to your server for a dedicated use
– file storage: this is similar to NFS. A shared storage for multiple servers
– object storage: this storage is usefull to make some files available wherever you need, just by sharing a link

Storage on OCI only relies on SSD disks, so expect high performances regarding I/Os.

How much it costs?

That’s the most difficult question, because you’ll have to define your needs, build your infrastructure on paper, then compute the cost with a cost calculator provided by Oracle. There is two billing options available at this moment: prepaid, with Universal Cloud Credits, or pay-as-you-go based on service utilization. The costs may vary depending on multiple parameters. Base budget for an OCI infrastructure starts from 1000$ a month. Don’t expect an OCI infrastructure to be much less expensive than on-premise servers: it’s mainly interesting because you don’t bother with budgeting, buying, deploying, managing servers on your own. And think about how quick you can deploy a new environment, or destroy an old one. It’s another way of spending your IT budget.

The cost calculator is here.

Conclusion

OCI is a mature Cloud, ready for production and with multiple services available and evolving constantly. Test-it to discover how powerfull it is and make sure to understand all the benefits you can get compared to on-premise solutions.

Cet article Oracle Cloud basics for begginers est apparu en premier sur Blog dbi services.

↧

Postgres Vision 2020 is still live

June 26, 2020, 2:07 am

≫ Next: Oracle GoldenGate 19c: Cannot register Integrated EXTRACT due to ORA-44004

≪ Previous: Oracle Cloud basics for begginers

This year the Postgres Vision 2020 conference was a Free Virtual Event that took place on June 23-24, 2020 and the Postgres Vision Theme was “Postgres is for builders”

Experts, IT professionals and Postgres community leaders from all around the world were talking about the future of Postgres, real-world stories, Postgres in the cloud, Postgres containerized but also this year there were also sessions focusing on how developers are building scalable and secure applications.

Introduction

In the following URL, Ed Boyajian, EDB CEO has posted a short video introducing Postgres Vision 2020

Now, once logged in, let’s get in the virtual Expo Hall

To access sessions you have to enter the theater where all sessions are displayed with their abstract and speaker(s).

My first session was “Power to Postgres”and it started on Tuesday: 23/06/2020 at 9:15am (local time)
I was immediately blown away by the huge number of people connected. It was coming from all over the world.
During this 45mn talk, Ed Boyajian, EDB CEO, was sharing his experience and vision for the Postgres movement along with stories of how organizations are leveraging Postgres to accelerate innovation.

His last words in this monologue were rather reassuring regarding the future of Postgres.
“This is the revolution and Postgres will win it, power to Postgres”

Following are only some of the “live” sessions I attended to and I won’t detailed them as it is possible, once registered to watch them again on demand (click on the registration button).

– “A Look at the Elephants Trunk – PostgreSQL 13” – Speaker : Magnus Hagander
– “What’s New in 3.0 and Coming in PostGIS 3.1?” – Speaker: Regina Obe
– “EDB Postgres Advanced Server – What’s New?” – Speaker: Rushabh Lathia
– “Table Partitioning in Postgres: How Far We’ve Come” – Speaker: Amit Langote

Conclusion

In these difficult times, especially because of Covid-19, this event was really a great success by its organization, by the quality of the speakers and their presentations. I think that in the future, this type of virtual event will become more widespread and allow a greater number of participants.
My only regret is that it does not allow to meet the participants, the speakers between sessions and to discuss with them.

Cet article Postgres Vision 2020 is still live est apparu en premier sur Blog dbi services.

↧