Quantcast
Channel: Blog dbi services
Viewing all 1431 articles
Browse latest View live

Make Oracle database simple again!

$
0
0

Introduction

Oracle database is a great piece of software, yes it’s quite expensive, but it’s still the reference and most of the companies can find a configuration that fits their needs according to a budget. Another complain about Oracle is the complexity: nothing is really simple, and you’ll need skillful DBA(s) to deploy, manage, upgrade, troubleshoot your databases. But complexity is sometimes caused by wrong decisions you make without having the knowledge, mainly because some choices add significant complexity compared to others. Let’s have a look at how to make Oracle database simple as it was before.

The goal

Why the things need to be simple?

Obviously, simplification is:

  • easier troubleshooting
  • more “understandable by the others”
  • reinstallation made possible in case of big troubles
  • avoiding bugs related to the mix of multiple components
  • less work, because you probably have enough work with migration, patching, performance, …
  • more reliability because less components is less problems

On the hardware side

Rules for simplifying on the hardware side are:

  • Choose the same hardware for all your environments (DEV/TEST/QUAL/PROD/…): same server models, same CPU family, same revision. Make only slight variations on memory amount, number of disks and processor cores configuration if needed. Order all the servers at the same time. If a problem is related to hardware, you will be able to test the fix on a less important environment before going on production
  • Don’t use SAN: SAN is very nice, but SAN is not the performance guarantee you expect. Adopt local SSD disks: NVMe type SSDs have amazing speed, they are a true game changer in today’s database performance. Getting rid of the SAN is also getting rid of multipathing, resource sharing, complex troubleshooting, external dependencies and so on
  • Provision very large volumes for data: dealing with space pressure is not the most interesting part of your job. And it’s time consuming. You need 4TB of disks? Order 12TB and you’ll be ready for each situation even those not planned. For sure it’s more expensive, but adding disks afterall is not always that simple. It makes me think about a customer case where trying to add a single disk led to a nightmare (production down for several days)
  • Consider ODA appliances (Small or Medium): even if it’s not simplifying everything, at least hardware is all that you need and is dedicated to Oracle database software
  • Think consolidation: Oracle database has a strong database isolation thus leading to easy consolidation. Consolidate to limit the number of servers you need to manage is also simplifying your environment
  • Avoid virtualization: without talking about the license, virtualization is for sure underlying complexity

On the system side

Some rules are also good to know regarding the system:

  • Go for Redhat or Oracle Linux: mainly because it’s the most common OS for Oracle databases. Releases and patches are always available for Linux first. UNIX and Windows are decreasing in popularity for Oracle Databases these past 10 years
  • Same OS: please keep your operating systems strictly identical from development to production. If you decide to upgrade the OS, do that first on TEST/DEV and finish with production servers without waiting months. And never update through internet as packages can be different each time you update a system
  • Limit the filesystems number for your databases: 1 big oradata and 1 big FRA is enough on SSD, you don’t need to slice everything as we did before, and slicing is always wasting space

On the software side

A lot of things should be done, or not done regarding software:

  • Install the same Oracle version (release + patch) and use the same tree everywhere. Use OFA (/u01/…) or not but be consistent
  • Limit the Oracle versions in use: inform your software vendors that your platform cannot support too old versions, especially non-terminal releases like 11.2.0.3. 19c, 12.1.0.2 and eventually 11.2.0.4 are the only recommended version to deploy
  • Don’t use ASM: because ext4 is fine and SSDs now bring you maximum performance even on a classic filesystem. ASM will always be linked to Grid Infrastructure making dependencies between the DB Homes and that Grid stack, making patching much more complex
  • Don’t use RAC: because most of your applications cannot correctly manage high availability. RAC is much more complex compared to single instance databases. Not choosing RAC is getting rid of interconnect, cluster repository, fusion cache for SGA sharing, SAN or NAS technologies, split brains, scan listeners and complex troubleshooting. Replacing RAC with Data Guard or Dbvisit standby is the new way of doing sort of high availability without high complexity

Simplify backup configuration

For sure you will use RMAN, but how to simplify backups with RMAN?

  • Use the same backup script for all the databases
  • Use the same backup frequency for each database because when you need to restore, you’d better have a fresh backup
  • Configure only different retention on each database
  • Backup to disk (the most convenient being on a big nfs share) and without any specific library (backup your /backup filesystem later with your enterprise backup tool if needed)
  • provision large enough filesystem to never need to delete backups still in the retention period

Using the same backup strategy means being able to use the same restore procedure on all databases because you always need a quick restore of a broken database.

Always backup controlfile and spfile on the standby databases, the resulting backupset has a very small footprint and makes easier restore of the standby using database backupsets from the primary without the need for duplication.

Consider RMAN catalog only if you have enough databases to manage.

Simplify database management and configuration

  • Create scripts for database configuration and tablespace creation (for example: configure_SID.sql and tablespaces_SID.sql) to be able to reconfigure the same database elsewhere
  • don’t create grid and oracle users if you plan to use Grid Infrastructure/ASM: as a DBA you probably manage both ASM and database instances. Instead of loosing time switching between these 2 users, configure only one oracle user for both
  • never use graphical tools to create a database, deploy an appliance, configure something: because screenshots are far less convenient than pure text commands easily repeatable and scriptable
  • Use OMF: configure only db_create_file_dest and db_recovery_file_dest and Oracle will multiplex the controlfile and the redologs in these areas. OMF is also naming datafiles for you: there is no need for manual naming, who really cares about the name of the datafiles?
  • Don’t use multitenant: multitenant is fine but it’s been years we’re living with non-CDB databases and it works like a charm. You can still use non-CDB architecture in 19c, so multitenant is not mandatory even on this latest version. Later migration from non-CDB to pluggable database is quite easy, you will be able to use multitenant later
  • Keep your spfile clean: don’t specify unused parameters or parameters that already have the given value as a standard. Remove from the spfile these parameters using ALTER SYSTEM RESET parameter SCOPE=spfile SID='*';

Simplify patching

Patching can also be simplified:

  • Patch once a year, because you need to patch, but you don’t need to spend all your time applying each PSU every 3 months
  • Start with test/dev databases and take the time to test from the application
  • Don’t wait too much to patch the other environments: production should be patched few weeks after the first patched environment

Simplify Oracle*Net configuration

Simplifying also concerns Oracle*Net configuration:

  • Avoid configuring multiple listeners on a system because one is enough for all your databases
  • put your Oracle*Net configuration files in /etc because you don’t want multiple files in multiple homes
  • Keep your Oracle*Net configuration files clean and organized for increased readability

Make your database movable

Last but not least, one of the biggest mistake is to create a strong dependency between a database and a system. How to make your database easily movable? By configuring a standby database and using Data Guard or Dbvisit standby. Moving your database to another server is done within a few minutes with a single SWITCHOVER operation.

Using standby databases make your life easier for all of these purposes:

  • you need to move your server to another datacenter
  • a power outage happened on one site
  • you need to update hardware or software
  • you suspect a hardware problem impacting the database

Don’t only create standbys for production databases: even development databases are some kind of production for developers. If a database cannot be down for 1 day, you need a standby database.

Finest configuration is not dedicating a server for the primaries and a server for the standbys but dispatching the primaries between 2 identical servers on 2 sites. Each database having a preference server for its primary, the standby being on the opposite server.

Conclusion

It’s so easy to increase complexity without any good reason. Simplifying is the power of saying NO. No to interesting features and configurations that are not absolutely necessary. All you need for your databases is reliability, safety, availability, performance. Simplicity helps you in that way.

Cet article Make Oracle database simple again! est apparu en premier sur Blog dbi services.


Adding PostgreSQL extensions without being super user?

$
0
0

Usually, when you need to install a PostgreSQL extension you do this as superuser (or at least I am doing it like this). The downside of that is, of course, that a super user must be available once a new extension is required or that all the extensions are installed by default (e.g. in template1). Recently the question popped up internally if you can install extensions without being super user. The answer is: yes and no. It depends on the objects that get created by the extensions and it depends on the permissions of the user that wants to install the extension. As always, lets do some demos as this is the easiest way to understand how things work.

Currently there are no extensions installed in my database except for the standard PL/pgSQLl extension:

postgres=# \dx
                 List of installed extensions
  Name   | Version |   Schema   |         Description          
---------+---------+------------+------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(1 row)

For the purpose of this post lets create a new user without any special permissions other than login:

postgres=# create user a with login password 'a';
CREATE ROLE
postgres=# \du a
           List of roles
 Role name | Attributes | Member of 
-----------+------------+-----------
 a         |            | {}

This user, even if not granted anything, by default has permissions to create objects in the public schema (you should definitely avoid that, check here):

postgres=# \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> select current_database();
 current_database 
------------------
 postgres
(1 row)
postgres=> create table tab1 ( a int );
CREATE TABLE

What this user is not able to do, is to use create extension” to install a new extension into the database:

postgres=> create extension lo;
ERROR:  permission denied to create extension "lo"
HINT:  Must be superuser to create this extension.
postgres=> 

Why is that? If we take a look at the extension’s SQL file the first statement is this:

CREATE FUNCTION ltree_in(cstring)
RETURNS ltree
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;

“CREATE FUNCTION” does work as we are able to create objects in the public schema. The issue is this:

postgres=> CREATE FUNCTION ltree_in(cstring)
postgres-> RETURNS ltree
postgres-> AS 'MODULE_PATHNAME'
postgres-> LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE;
ERROR:  permission denied for language c
postgres=> 

We do not have access to the language. Lets try to grant the required privilege for using the language:

postgres=# grant USAGE on LANGUAGE c to a;
ERROR:  language "c" is not trusted
DETAIL:  GRANT and REVOKE are not allowed on untrusted languages, because only superusers can use untrusted languages.

That does not work as well, as c is untrusted. The same is true for “language internal” as in intagg–1.1.sql:

postgres=> CREATE FUNCTION int_agg_state (internal, int4)
postgres-> RETURNS internal
postgres-> AS 'array_agg_transfn'
postgres-> PARALLEL SAFE
postgres-> LANGUAGE INTERNAL;
ERROR:  permission denied for language internal

As all the extensions in standard PostgreSQL community do reference either “c” or “internal” somehow we do not have any chance to get an extension installed as user “a”. Lets do another test and create a new database with user “a” as it’s owner:

postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# create database dba with owner a;
CREATE DATABASE
postgres=# \c dba a
You are now connected to database "dba" as user "a".
dba=> 

Can we install extensions now?

dba=> create extension lo;
ERROR:  permission denied to create extension "lo"
HINT:  Must be superuser to create this extension.
dba=> 

Another error message but it still does not work. We can, however, install PL/Perl as this is a trusted language:

dba=> create extension plperl;
CREATE EXTENSION
dba=> 

Actually PL/Perl is the only extension that can be installed in this scenario, which can be confirmed by a simply PL/pgSQL code block:

dba=> drop extension plperl;
DROP EXTENSION
dba=> do $$
dba$> declare
dba$>   ext record;
dba$>   exception_text text;
dba$> begin
dba$>   for ext in
dba$>       select name
dba$>         from pg_available_extensions
dba$>        order by name
dba$>   loop
dba$>     begin
dba$>        execute 'create extension '||ext.name;
dba$>        raise notice 'SUCCESS for %', ext.name;
dba$>     exception
dba$>        when others
dba$>          then get stacked diagnostics exception_text = MESSAGE_TEXT;
dba$>          raise notice '% failed with %', ext.name, exception_text;
dba$>     end;
dba$>   end loop;
dba$> end $$;
NOTICE:  adminpack failed with permission denied to create extension "adminpack"
NOTICE:  amcheck failed with permission denied to create extension "amcheck"
NOTICE:  autoinc failed with permission denied to create extension "autoinc"
NOTICE:  bloom failed with permission denied to create extension "bloom"
NOTICE:  btree_gin failed with permission denied to create extension "btree_gin"
NOTICE:  btree_gist failed with permission denied to create extension "btree_gist"
NOTICE:  citext failed with permission denied to create extension "citext"
NOTICE:  cube failed with permission denied to create extension "cube"
NOTICE:  dblink failed with permission denied to create extension "dblink"
NOTICE:  dict_int failed with permission denied to create extension "dict_int"
NOTICE:  dict_xsyn failed with permission denied to create extension "dict_xsyn"
NOTICE:  earthdistance failed with required extension "cube" is not installed
NOTICE:  file_fdw failed with permission denied to create extension "file_fdw"
NOTICE:  fuzzystrmatch failed with permission denied to create extension "fuzzystrmatch"
NOTICE:  hstore failed with permission denied to create extension "hstore"
NOTICE:  hstore_plperl failed with required extension "hstore" is not installed
NOTICE:  hstore_plperlu failed with required extension "hstore" is not installed
NOTICE:  hstore_plpython2u failed with required extension "hstore" is not installed
NOTICE:  hstore_plpython3u failed with required extension "hstore" is not installed
NOTICE:  hstore_plpythonu failed with required extension "hstore" is not installed
NOTICE:  insert_username failed with permission denied to create extension "insert_username"
NOTICE:  intagg failed with permission denied to create extension "intagg"
NOTICE:  intarray failed with permission denied to create extension "intarray"
NOTICE:  isn failed with permission denied to create extension "isn"
NOTICE:  jsonb_plperl failed with required extension "plperl" is not installed
NOTICE:  jsonb_plperlu failed with required extension "plperlu" is not installed
NOTICE:  jsonb_plpython2u failed with required extension "plpython2u" is not installed
NOTICE:  jsonb_plpython3u failed with required extension "plpython3u" is not installed
NOTICE:  jsonb_plpythonu failed with required extension "plpythonu" is not installed
NOTICE:  lo failed with permission denied to create extension "lo"
NOTICE:  ltree failed with permission denied to create extension "ltree"
NOTICE:  ltree_plpython2u failed with required extension "ltree" is not installed
NOTICE:  ltree_plpython3u failed with required extension "ltree" is not installed
NOTICE:  ltree_plpythonu failed with required extension "ltree" is not installed
NOTICE:  moddatetime failed with permission denied to create extension "moddatetime"
NOTICE:  pageinspect failed with permission denied to create extension "pageinspect"
NOTICE:  pg_buffercache failed with permission denied to create extension "pg_buffercache"
NOTICE:  pg_freespacemap failed with permission denied to create extension "pg_freespacemap"
NOTICE:  pg_prewarm failed with permission denied to create extension "pg_prewarm"
NOTICE:  pg_stat_statements failed with permission denied to create extension "pg_stat_statements"
NOTICE:  pg_trgm failed with permission denied to create extension "pg_trgm"
NOTICE:  pg_visibility failed with permission denied to create extension "pg_visibility"
NOTICE:  pgcrypto failed with permission denied to create extension "pgcrypto"
NOTICE:  pgrowlocks failed with permission denied to create extension "pgrowlocks"
NOTICE:  pgstattuple failed with permission denied to create extension "pgstattuple"
NOTICE:  SUCCESS for plperl
NOTICE:  plperlu failed with permission denied to create extension "plperlu"
NOTICE:  plpgsql failed with extension "plpgsql" already exists
NOTICE:  plpython3u failed with permission denied to create extension "plpython3u"
NOTICE:  postgres_fdw failed with permission denied to create extension "postgres_fdw"
NOTICE:  refint failed with permission denied to create extension "refint"
NOTICE:  seg failed with permission denied to create extension "seg"
NOTICE:  sslinfo failed with permission denied to create extension "sslinfo"
NOTICE:  tablefunc failed with permission denied to create extension "tablefunc"
NOTICE:  tcn failed with permission denied to create extension "tcn"
NOTICE:  tsm_system_rows failed with permission denied to create extension "tsm_system_rows"
NOTICE:  tsm_system_time failed with permission denied to create extension "tsm_system_time"
NOTICE:  unaccent failed with permission denied to create extension "unaccent"
NOTICE:  xml2 failed with permission denied to create extension "xml2"
DO

If you want to install an extension that only creates objects you are allowed to create anyway, that would succeed. The probably best way to handle extensions is to install all the required ones by default and then grant permissions to the users who need them.

Cet article Adding PostgreSQL extensions without being super user? est apparu en premier sur Blog dbi services.

PostgreSQL Meetup at CERN, Geneva, January 17, 2020

$
0
0

Registration

Last Friday (17/01/20) I had the opportunity to go to Geneva at CERN to attend a PostgreSQL meetup.
I’m really happy to write a blog about for a lot of reasons. First of all, it was about PostgreSQL (essential these days), also for the contributors as Oleg Bartunov and finally because of the organizers: Franck Pachot (old & soon new colleague by dbi-services) & Laeticia Avrot I met at the SOUG Day in Lausane.


After the registration process where a lot of people were waiting to get their badge, I followed the instructions given to reach the Building 503.  I was rapidly lost in this  labyrinth and I had to ask several times to find the conference room.

Unfortunately I was late and missed the “Welcome & opening speech” at 02pm. Worst, the first minutes of Gülçin Yıldırım’s session.
I took discreetly a seat closed to the door in the conference room. I was immediately impressed as already more than 100 persons were in.
It seems to be that there was a huge interest for this event.

So the first talk was about the evolution of Fault Tolerance in PostgreSQL.
Gülcin mainly focused on the evolution of fault tolerance capabilities in PostgreSQL throughout its versions.
Robustness, replication methods, failover & Switchover were the main topics.

The second talk was from Pavlo Golub , a PostgreSQL expert and developer at Cybertec (Software & database partner in Austria).
After some words about the existing scheduling tools in the Postgres community, he introduced a new Open Source tool from Cybertec, called pg_timetable.
He went very deeply in details & discussed some advanced topics like transaction support and cross-platform tasks.
At the end we had a successful demo.

Break

After this session, we had a break as it was 04pm. A super kings cake’s party (Galette des Rois) accompanied by excellent ciders were offered.
We could even choose either the one from Normandy or from Brittany.
I tasted both and appreciated.
By the way, I found the bean twice. I have to admit that this was a really good idea as the athmospere was relaxed and friendly.

At around 04:45pm, Anastasia Lubennikova started her session. A comparison between various backup/recovery solutions for PostgreSQL.
The goal  was to help us to choose the most appropriate tool. She covered several advanced topics in that area as: incremental backups, archive management, backup validation, retention policies.

The next session from Vik Fearing was a short one – Advent of Code Using PostgreSQL
During his talk, Vik showed some techniques to solve general-purpose programming challenges using just a single query with lateral joins.

Romuald Thion, a PostgreSQL teacher at the university of Lyon (France) followed: his talk was about some lessons learned
since he has progressively introduced PostgreSQL in its courses in 2018.
Since then,  400 students enrolled from second to fifth year use PostgreSQL in databases, web, security or system administration courses.

Finally and for the last session like an apotheosis, came on stage Oleg Bartunov, a PostgreSQL Major Contributor & developer.
“All You Need Is Postgres” was the title of his session. This could be soon a hymn.
Oleg talked a lot about his long Postgres professional experience since Postgres95, about all the projects he participated as GiST, GIN and SP-GiST, and also full text search, NoSQL features (hstore and jsonb). He talked also about astronomy and Supernovae.
This was really fascinating and we could listen to him for hours.

Conclusion

A Drinks & Food party concluded this meetup. It was a nice opportunity to meet all speakers. To discuss with them and also some customers or old colleagues (Pierre Boizot).
Learn and share, this is part of dbi-services spirit and matches our values!
One last thing, I would like to address special thanks to Franck Pachot & Laetitia Avrot for the good organization & overall the cool atmosphere during this meetup.
Hoping for next…

Cet article PostgreSQL Meetup at CERN, Geneva, January 17, 2020 est apparu en premier sur Blog dbi services.

SQL Server Tips: Path of the default trace file is null

$
0
0

In addition of my precedent blog about this subject “SQL Server Tips: Default trace enabled but no file is active…”, I add a new case where the default path of the trace file was empty.

The first step was to verify if the default trace is enabled with the command:

SELECT * FROM sys.configurations WHERE name=’default trace enable’

It is enabled, then I check the current running trace with the view sys.traces

SELECT * FROM sys.traces


As you can see, this time I have a trace but with a null in the Path for the trace file…

To correct this issue, the only way is to stop and reactive the trace in the configuration:

EXEC sp_configure 'show advanced options',1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'default trace enabled',0;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'default trace enabled',1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'show advanced options',0;
GO
RECONFIGURE WITH OVERRIDE;
GO

Et voila, I have a trace file now…

Cet article SQL Server Tips: Path of the default trace file is null est apparu en premier sur Blog dbi services.

Tracking Logs Inside a Documentum Container (part I)

$
0
0

Containers running under docker can have their stdout observed from the outside through the “docker logs”command; here is an excerpt of its usage:

docker logs --help
Usage:	docker logs [OPTIONS] CONTAINER

Fetch the logs of a container

Options:
      --details        Show extra details provided to logs
  -f, --follow         Follow log output
      --since string   Show logs since timestamp (e.g. 2013-01-02T13:23:37) or relative (e.g. 42m
                       for 42 minutes)
      --tail string    Number of lines to show from the end of the logs (default "all")
  -t, --timestamps     Show timestamps
      --until string   Show logs before a timestamp (e.g. 2013-01-02T13:23:37) or relative (e.g.
                       42m for 42 minutes)

e.g.:

Example of output:

docker logs --follow --timestamps container05bis
...
2019-07-10T03:50:38.624862914Z ==> /app/dctm/dba/log/docbroker.container05bis.1489.log <==
2019-07-10T03:50:38.624888183Z OpenText Documentum Connection Broker (version 16.4.0000.0248 Linux64)
2019-07-10T03:50:38.624893936Z Copyright (c) 2018. OpenText Corporation
2019-07-10T03:50:38.624898034Z 2019-07-10T05:50:38.519721 [DM_DOCBROKER_I_START]info: "Docbroker has started. Process id: 35"
2019-07-10T03:50:38.624902047Z 2019-07-10T05:50:38.521502 [DM_DOCBROKER_I_REGISTERED_PORT]info: "The Docbroker registered using port (1489)."
2019-07-10T03:50:38.624906087Z 2019-07-10T05:50:38.521544 [DM_DOCBROKER_I_LISTENING]info: "The Docbroker is listening on network address: (INET_ADDR: family: 2, port: 1489, host: container05bis (192.168.33.7, 0721a8c0))"
2019-07-10T03:50:38.624911984Z
2019-07-10T03:50:38.624915369Z ==> /app/dctm/dba/log/dmtest05bis.log <==
2019-07-10T03:50:38.625040316Z
2019-07-10T03:50:38.625050474Z ==> /app/dctm/dba/log/dmtest05bis/agentexec/agentexec.log <==
2019-07-10T03:50:38.625055299Z Wed Jul 10 03:33:48 2019 [INFORMATION] [LAUNCHER 4251] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625059065Z Wed Jul 10 03:34:18 2019 [INFORMATION] [LAUNCHER 4442] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625071866Z Wed Jul 10 03:34:48 2019 [INFORMATION] [LAUNCHER 4504] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625075268Z Wed Jul 10 03:36:18 2019 [INFORMATION] [LAUNCHER 4891] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625078688Z Wed Jul 10 03:36:49 2019 [INFORMATION] [LAUNCHER 4971] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625082182Z Wed Jul 10 03:48:18 2019 [INFORMATION] [LAUNCHER 6916] Detected during program initialization: Version: 16.4.0000.0248 Linux64
2019-07-10T03:50:38.625096886Z
2019-07-10T03:50:38.625101275Z ==> /app/dctm/wildfly9.0.1/server/DctmServer_MethodServer/logs/ServerApps.log <==
2019-07-10T03:50:38.625105098Z at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:511)
2019-07-10T03:50:38.625108575Z at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:101)
2019-07-10T03:50:38.625112342Z at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:82)
2019-07-10T03:50:38.625116110Z at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2019-07-10T03:50:38.625120084Z at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019-07-10T03:50:38.625123672Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2019-07-10T03:50:38.625127341Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2019-07-10T03:50:38.625131122Z at java.lang.Thread.run(Thread.java:748)
2019-07-10T03:50:38.625134828Z at org.jboss.threads.JBossThread.run(JBossThread.java:320)
2019-07-10T03:50:38.625139133Z 05:34:58,050 INFO [ServerService Thread Pool -- 96] com.documentum.cs.otds.DfSessionHandler - DFC Client Successfully initialized
2019-07-10T03:50:38.625143291Z
2019-07-10T03:50:38.625146939Z ==> /app/dctm/wildfly9.0.1/server/DctmServer_MethodServer/logs/ServerApps_trace.log <==
2019-07-10T03:50:38.625150991Z
2019-07-10T03:50:38.625154528Z ==> /app/dctm/wildfly9.0.1/server/DctmServer_MethodServer/logs/dmotdsrest.log <==
2019-07-10T03:50:38.625159563Z 03:33:16,505 [ServerService Thread Pool -- 66] DFC Client Successfully initialized
2019-07-10T03:50:38.625163445Z 05:34:58,050 [ServerService Thread Pool -- 96] DFC Client Successfully initialized
2019-07-10T03:50:38.625955045Z Setting up watches. Beware: since -r was given, this may take a while!
2019-07-10T03:50:38.627044196Z Watches established.
2019-07-10T03:50:38.934648542Z
2019-07-10T03:50:38.934668467Z ==> /app/dctm/dba/log/dmtest05bis.log <==
2019-07-10T03:50:38.934673076Z Wed Jul 10 05:50:38 2019[DM_STARTUP_W_DOCBASE_OWNER_NOT_FOUND] *** warning *** : The database user, dmtest05bisc as specified by your server.ini is not a valid user as determined using the system password check api. This will likely severly impair the operation of your docbase.
2019-07-10T03:50:39.001793811Z
2019-07-10T03:50:39.001816266Z
2019-07-10T03:50:39.001821414Z OpenText Documentum Content Server (version 16.4.0000.0248 Linux64.Oracle)
2019-07-10T03:50:39.001825146Z Copyright (c) 2018. OpenText Corporation
2019-07-10T03:50:39.001828983Z All rights reserved.
2019-07-10T03:50:39.001832816Z
2019-07-10T03:50:39.005318448Z 2019-07-10T05:50:39.005068 193[193] 0000000000000000 [DM_SERVER_I_START_SERVER]info: "Docbase dmtest05bis attempting to open"
2019-07-10T03:50:39.005337158Z
2019-07-10T03:50:39.005342472Z 2019-07-10T05:50:39.005172 193[193] 0000000000000000 [DM_SERVER_I_START_KEY_STORAGE_MODE]info: "Docbase dmtest05bis is using database for cryptographic key storage"
...

If the information is plethoric, the above command can be narrowed to a time window by adding −−since and −−until restrictions, e.g.:

docker logs --timestamps --follow --since 5m <container>

Thus, everything sent to the container’s stdout can be aggregated and viewed very simply from one place, yielding a cheap console log. In particular, Documentum containers could expose their well-known log files to the outside world, e.g. the docbroker log, the content server(s) log(s) and the method server logs. To this effect, it would be enough to just start a “tail -F ” on those files from within the entrypoint as illustrated below:

tail -F ${DOCUMENTUM}/dba/log/docbroker.log ${DOCUMENTUM}/dba/log/dmtest.log ${DOCUMENTUM}/wildfly9.0.1/server/DctmServer_MethodServer/logs/ServerApps.log ...

The -F option guarantees that the logs continue being followed even after a possible rotation.
Admittedly, this output can be a bit hard to read because the logs are interlaced, i.e. lines or block of lines from different logs are displayed sorted by the time they were produced and not by their origin. Actually, this is a benefit because it makes it easier to find correlations between distinct, apparently unrelated events.
Viewing a particular log is still possible from without the container, e.g.:

docker exec <container_name> /bin/bash -c "tail -f \${DOCUMENTUM}/dba/log/dmtest05bis.log"

provided the tail command exists in the container, which is not obvious as there is a definitive will to make images as stripped down as possible.
As those files are statically known (i.e. at build time), such command could be defined as early as in the buildfile and invoked in its entrypoint script.
Unfortunately, the content server logs are not very verbose and the most useful messages are directed to session or ad hoc logs. The session logs are dynamically created for each new session with the session id as their name, which makes it unpredictable. Since those names are only known at run-time, the above buildfile’s “tail -F” command cannot include them and consequently they are not displayed by the “docker logs” command. The same applies to on-demand trace files with variable names, e.g. with a timestamp suffix.
So, is there a way to follow those dynamic session logs (or any dynamically named files at that) anyway ? An obvious way is to use a file or directory watcher to be notified of any created or modified file. If a watcher process running inside the container could wait for such conditions and signal any occurence thereof, a listener process, also running inside the container, could receive the notification and dynamically fork a tail -F command to follow that file. Externally, “docker logs” would continue displaying whatever is sent to the container’s stdout, including the newly discovered files.
Under Linux, we can use inotifywait as the file watcher. Let’s see how to set it up.

Installing inotifywait

Under a Debian Linux derivative such as Ubuntu, inotifywait can be easily installed from the inotify-tools package through the usual command:

sudo apt-get install inotify-tools

Under a Red Hat Linux derivative such as Centos, a two-step method is to first grab the rpm package from its on-line repository and then install it; the latest release as of this writing is the 3.14-9:

  curl https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/i/inotify-tools-3.14-9.el7.x86_64.rpm -o inotify-tools-3.14-9.el7.x86_64.rpm
  sudo rpm -Uvh inotify-tools-3.14-9.el7.x86_64.rpm 

In order to verify that the installation was successful, just try to launch “inotifywait”:

$ inotifywait
No files specified to watch!

Good, the command exists; let’s now test it.

Testing inotifywait

The command inotifywait has the following invocation syntax:

inotifywait [-hcmrq] [-e  ] [-t  ] [--format  ] [--timefmt  ]  [ ... ]

The man page explains very well each of the parameters so, please, refer there for details.
As the whole point is to detect new or changed files whose name we don’t know at image build time, but whose location is known (e.g. a path such as ${DOCUMENTUM}/dba/log), we will be watching directories. In such a case, one parameter will be a list of directories to recursively watch for any new or modified files.
That command was designed to output to stdout any event whose occurence it was configured to wait for.
The default output format is the following:

watched_filename EVENT_NAMES event_filename

one line per file.
EVENT_NAMES is the event that occurred, in our case mainly one of CREATE or MODIFY as requested through the -e command-line parameter.
As we watch directories, watched_filename is the name of the watched directory where the event occured and event_filename, the created/modified file.
Here is an example of use as a test with custom formatting for a better presentation:

$ inotifywait --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T %e %w%f" /tmp &

The command will recursively (option −−recursive) watch the /tmp directory for any created or modified (option −−event …) file or a change of file attributes such as the timestamp or permissions. It will run in the background (option −−monitor) and issue events to stdout prefixed with a properly formatted time stamp (options −−timefmt and −−format).
With inotifywait running in the background, create now a dummy file in /tmp:

$ touch /tmp/xx
# the following lines get displayed:
2019/12/31-17:43:45 CREATE /tmp/xx
2019/12/31-17:43:45 ATTRIB /tmp/xx

Actually, the /tmp directory is a busy directory and a lot of file activity occurs in there very quickly:

2019/12/31-17:43:52 CREATE /tmp/hsperfdata_dmadmin/471
2019/12/31-17:43:52 MODIFY /tmp/hsperfdata_dmadmin/471
2019/12/31-17:43:52 MODIFY /tmp/hsperfdata_dmadmin/471
2019/12/31-17:44:06 MODIFY /tmp/nohup.log
2019/12/31-17:44:06 MODIFY /tmp/nohup.log
2019/12/31-17:44:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:06 MODIFY /tmp/nohup.log
2019/12/31-17:45:22 CREATE /tmp/hsperfdata_dmadmin/510
2019/12/31-17:45:22 MODIFY /tmp/hsperfdata_dmadmin/510
2019/12/31-17:45:22 MODIFY /tmp/hsperfdata_dmadmin/510
2019/12/31-17:45:50 CREATE /tmp/runc-process785366568
2019/12/31-17:45:50 MODIFY /tmp/runc-process785366568
...

Let’s see if the directory is really watched recursively:

$ mkdir -p /tmp/dir1/dir2
> 2019/12/31-17:49:51 CREATE,ISDIR /tmp/dir1
> 2019/12/31-17:49:51 CREATE,ISDIR /tmp/dir1/dir2

We notice that the directory creation is trapped too.

$ touch /tmp/dir1/dir2/xx
> 2019/12/31-17:49:59 CREATE /tmp/dir1/dir2/xx
> 2019/12/31-17:49:59 ATTRIB /tmp/dir1/dir2/xx

It works as advertised. Moreover, two events are raised here, CREATE for the file creation as it didn’t previously exist, and ATTRIB for the change of timestamp. Let’s verify this:

$ touch /tmp/dir1/dir2/xx
> 2019/12/31-17:50:01 ATTRIB /tmp/dir1/dir2/xx
$ touch /tmp/dir1/dir2/xx
> 2019/12/31-17:50:03 ATTRIB /tmp/dir1/dir2/xx

Indeed.
Let’s see if a change of attribute is also noticed:

$ chmod -r /tmp/dir1/dir2/xx
2019/12/31-18:03:50 ATTRIB /tmp/dir1/dir2/xx
$ chmod +r /tmp/dir1/dir2/xx
2019/12/31-18:03:55 ATTRIB /tmp/dir1/dir2/xx

It is, fine.

Using inotify in a [containerized] Documentum installation

inotifywait will be used with the syntax shown above against the Documentum log directory ${DOCUMENTUM}/dba/log. Its output will be piped into a gawk script that will spawn “tail -F” commands when needed and keep a list of such processes so they can be killed after a timeout of inactivity, i.e. when they are following a file that did not get updated within a given time interval (let’s jokingly name this value “tail time to live”, or TTTL). This is to prevent a potentially unlimited number of such processes to hog the system’s resources, not that they consume much CPU cycles but it is pointless to leave hundreds of such idle processes sitting in memory. So the script will start a tail command on a file upon its CREATE event, or a MODIFY event if not already followed, and clean up existing idle tail commands. As said before, the code below could be included into a container’s entrypoint to make it run constantly in the background.

#!/bin/bash

export watcher_workdir=/tmp/watcher
export tail_on_off=${watcher_workdir}/tail_on_off
export heartbeat_file=${watcher_workdir}/heartbeat_file
export pause_duration=3

# comma-separated list of files watched but excluded from tailing;
export excluded_files="$tail_on_off,$heartbeat_file"

# make private copies of inotifywait and tail so they can be easily identified and killed from the list of running processes;
export private_inotify=~/dctm-inotifywait
export private_tail=~/dctm-tail

follow_logs() {
   # tail time to live, maximum duration in minutes an unmodified tailed file will stay tailed before the tail is killed, i.e. TTTL ;-);
   tail_timeout=1
 
   $private_inotify --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" ${DOCUMENTUM}/dba/log $watcher_workdir | gawk -v tail_timeout=$((tail_timeout * 60)) -v excluded_files="$excluded_files" -v tail_on_off=$tail_on_off -v heartbeat_file=$heartbeat_file -v env_private_tail=private_tail -v FS="|" -v Apo="'" 'BEGIN {
      # get the dedicated tail command from the environment;
      private_tail=ENVIRON[env_private_tail]

      # get current time;
      cmd_now = "date +\"%Y/%m/%d-%H:%M:%S\""
 
      # get the time of the next check, i.e. now + the timeout delay;
      cmd_future = "date +\"%Y/%m/%d-%H:%M:%S\" --date=\"" tail_timeout  " seconds\""
      cmd_future | getline next_check_date; close(cmd_future)
 
      # find the running private tail commands;
      # use the same FS defined in the outer gawk scope to guarantee thet $0 can be split smoothly;
      cmd_tail = "pgrep -fa " private_tail " | gawk " Apo "{print $1 \"" FS "\" $NF}" Apo
 
      # get the most recent time a running idle tail command is allowed to live;
      cmd_oldest = "date +\"%Y/%m/%d-%H:%M:%S\" --date=\"" tail_timeout  " seconds ago\""
 
      # command to get the modifiy date of a tailed file;
      cmd_file_stamp = "date +\"%Y/%m/%d-%H:%M:%S\" --reference="
 
      # files can be excluded from tailing if specified in parameter exclude_files;
      # convert the space-separated list of excluded files to an associative array, which is easier to search as it is indexed by file names;
      nb_files = split(excluded_files, tmp_excluded_files, ",")
      for (;nb_files > 0; nb_files--)
         tab_excluded_files[tmp_excluded_files[nb_files]] = 0

      bMust_tail = 1
   }
   {
      # skip directories (they have a trailing /);
      if (match($3, /\/$/)) next
 
      # check if logs must be tailed;
      if (tail_on_off == $3 && ("CREATE" == $2 || "MODIFY" == $2)) {
         if ((getline bMust_tail < tail_on_off) >  0) {
            close(tail_on_off)
            system("rm " tail_on_off " 2>/dev/null")
         }
      } else 
         # tailing on ?
         if (bMust_tail && !($3 in current_tails))
            # CREATE event ?
            if ("CREATE" == $2 && !($3 in tab_excluded_files)) {
               # newly created file not in exclusion list: tail it !
               system(private_tail " --follow=name --retry " $3 " 2>/dev/null &")
               # ... and keep track of it;
               current_tails[$3] = 0
            }
            # MODIFY event ?
            else if ("MODIFY" == $2 && !($3 in tab_excluded_files)) {
               # modified file not in exclusion list nor already tailed: tail it !
               system(private_tail " --follow=name --retry " $3 " 2>/dev/null &")
               # ... and keep track of it;
               current_tails[$3] = 0
            }
 
      # clean up logic starts here;
      # only upon a heartbeat event;
      if ("ATTRIB" == $2 && $3 == heartbeat_file) {
         # time to check ?
         cmd_now | getline now; close(cmd_now)
         if (now >= next_check_date) {
            # maximum time to live for idle tails;
            cmd_oldest | getline oldest_date; close(cmd_oldest)
            # loop though all the running tail commands;
            while ((cmd_tail | getline) > 0) {
               # cannot explain this spurious entry, ignoring it explicitly eventhough it will anyway; 
               if ("$NF}" Apo == $2) continue
               tail_pid = $1
               tailed_file = $2
               # is it one of the watched files ?
               if (tailed_file in current_tails) {
                  # get the current tailed file last modification date;
                  (cmd_file_stamp tailed_file " 2>/dev/null") | getline last_modif_date; close(cmd_file_stamp tailed_file " 2>/dev/null")
                  # tailed file not updated during time to live period ?
                  if (last_modif_date <= oldest_date) {
                     # yes, kill the tailing process;
                     system("kill -9 " tail_pid " 2> /dev/null")
                     # ... and update the list of tailed files;
                     delete current_tails[tailed_file]
                  }
                  else current_tails[tailed_file] = 1
               }
               # else it should not happen because private tail commands are only issues from here and get tracked;
            }
            close(cmd_tail)

            # resynchronize internal list with actual tailed files;
            for (f in current_tails)
               if (0 == current_tails[f])
                  # tracked file not tailed any more (this should not happen because private tail commands are killed from here only), untrack it;
                  delete current_tails[f]
               else
                  # tracked file was checked and is still alive;
                  current_tails[f] = 0
 
            # compute the next check time;
            cmd_future | getline next_check_date; close(cmd_future)
         }
      }
   }'
}
 
 
# -----------
# main;
# -----------

# create an inotifywait alias so it is easily identified to be stopped later;
ln -s /usr/bin/inotifywait $private_inotify 2> /dev/null
ln -s /usr/bin/tail $private_tail           2> /dev/null

# create the watcher's workdir;
mkdir -p $watcher_workdir 2> /dev/null

# enable following new or modified logs;
echo 1 > $tail_on_off
 
# start the watcher;
follow_logs &

while [ true ]; do
   # send heartbeat to inotify;
   touch $heartbeat_file
   # do something else here;
   # ...
   sleep $pause_duration
done

Admittedly, the fact that gawk relies a lot on external commands and pipes (because of its lack of libraries, but this can be arranged by extending it, see other articles in this blog such this one) obfuscates somewhat the statements’ purpose. Also, all the logic is contained in the not-so-obvious automatic loop which is executed each time an event is notified. Anyway, as usual the code is only provided as an illustration.
On line 3, a “technical” sub-directory is defined and created later on line 134. It will contain the heartbeat file (file heartbeat_file, see next paragraph for more details) and a file to switch tailing on and off (file tail_on_off). This directory is watched (see the call to inotifywait on line 19).
One line 12 and 13, private versions of the commands inotifywait and tail are defined and created later on line 130 and 131 as symlinks; this is to facilitate their identification in the list of running processes to kill them if needed.
This script is encapsulated in function follow_logs() started on line 15. It is launched in the background on line 140.
On line 46 the boolean bMust_tail is initialized; it gets updated on line 53 upon a CREATE event on the file $tail_on_off; after it has been read into bMust_tail, the file is remove. By writing 0 or a positive number in it, aggregation of new logs is respectively disabled or enabled:

# disable following new logs;
# from within a container or in a traditional installation:
$ echo 0 > $tail_on_off
 
# or, for a container, supposing $tail_on_off is defined in the container's current user's environment:
$ docker exec container_name /bin/bash -c "echo 0 > \$tail_on_off"
 
# enable following new logs;
# from within a container or in a traditional installation:
$ echo 1 > $tail_on_off
 
# or, for a container, supposing $tail_on_off is defined in the container's current user's environment:
$ docker exec container_name /bin/bash -c "echo 1 > \$tail_on_off"

Currently running tail commands are not impacted.
Note how the $ in $tail_on_off is escaped so it is not consumed by the docker’s host shell and is passed as-is to the container.
On line 63 and 70, private tails commands are started when new files have appeared in the watched sub-directory, or modified files which are not already tailed.
One line 79, a clean-up of idle (private) tail commands is performed and the internal associative array that keeps track of them is refreshed to make sure it reflects the currently running tails.
One line 142, the well-known container entrypoint’s never-ending loop is entered.
The above script can also be run stand-alone in a classic, non containerized installation.

Keeping the gawk script busy

A problem to solve is that the gawk script runs as a coroutine to inotify, i.e. it is synchronized with its output. If there is none, because no event were detected, the script blocks waiting for input, so no cleaning of idle tail commands gets done. This not such a big deal because no new tails commands are started either so the status quo is maintained. However, an easy work-around is possible: In order to force the script to enter the next iteration and give it a chance to perform the cleaning, we introduce a heartbeat with one watched file, e.g.:

...
while [ true ]; do
   # send heartbeat to inotify;
   touch $heartbeat_file
   # do something else here;
   # ...
   sleep $pause_duration
done


This explains the presence of the file $heartbeat_file in the list of inotifywait’s target (see line 19 in the code above).
Now, because its timestamp is updated, at least one event is always raised periodically, an ATTRIB on the heartbeat file $heartbeat_file, as shown before. Although ATTRIB events are listened to, no action is done upon them generally, except when occuring on $heartbeat_file in which case their sole purpose is to trigger the script’s execution, more precisely, the cleanup of inactive tail commands.

Let’s test it now.

Cet article Tracking Logs Inside a Documentum Container (part I) est apparu en premier sur Blog dbi services.

Tracking Logs Inside a Documentum Container (part II)

$
0
0

Testing the log watcher

This is part II of the article. Part I is here.
All the above code has to be included in the entrypoint script so it gets executed at container start up time but it can also be tested more simply in a traditional repository installation.
First, we’ll move the code into a excutable script, e.g. entrypoint.sh, and run it in the background in a first terminal. Soon, we will notice that lots of log messages get displayed, e.g. from jobs executing into the repository:

---------------------------------------
Job Arguments:
(StandardJobArgs: docbase_name: dmtest.dmtest user_name: dmadmin job_id: 0800c35080007042 method_trace_level: 0 )
window_interval=120
queueperson=null
max_job_threads=3
 
---------------------------------------
2020-01-02T07:03:15.807617 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect XCP automatic tasks to be processed:
select r_object_id from dmi_workitem where a_wq_name = 'to be processed by job' and r_auto_method_id != '0000000000000000' and (r_runtime_state = 0 or r_runtime_state = 1) order by r_creation_date asc
Report End 2020/01/02 07:03:15
2020-01-02T07:03:16.314586 5888[5888] 0100c3508000ab90 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab90 quit."
Thu Jan 02 07:04:44 2020 [INFORMATION] [LAUNCHER 6311] Detected during program initialization: Version: 16.4.0080.0129 Linux64
2020-01-02T07:04:44.736720 6341[6341] 0100c3508000ab91 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab91 started for user dmadmin."
Thu Jan 02 07:04:45 2020 [INFORMATION] [LAUNCHER 6311] Detected while preparing job dm_Initialize_WQ for execution: Agent Exec connected to server dmtest: [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab91 started for user dmadmin."
 
 
2020-01-02T07:04:45.686337 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
2020-01-02T07:04:45.698970 1597[1597] 0100c3508000ab7f [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab7f is owned by user dmadmin now."
Initialize_WQ Report For DocBase dmtest.dmtest As Of 2020/01/02 07:04:45
 
---------------------------------------
Job Arguments:
(StandardJobArgs: docbase_name: dmtest.dmtest user_name: dmadmin job_id: 0800c3508000218b method_trace_level: 0 )
window_interval=120
queueperson=null
max_job_threads=3
 
---------------------------------------
Starting WQInitialisation job:
2020-01-02T07:04:45.756339 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect Unassigned worktiems to be processed:
select r_object_id, r_act_def_id from dmi_workitem where a_held_by = 'dm_system' and r_runtime_state = 0 order by r_creation_date
Total no. of workqueue tasks initialized 0
Report End 2020/01/02 07:04:45
2020-01-02T07:04:46.222728 6341[6341] 0100c3508000ab91 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab91 quit."
Thu Jan 02 07:05:14 2020 [INFORMATION] [LAUNCHER 6522] Detected during program initialization: Version: 16.4.0080.0129 Linux64
2020-01-02T07:05:14.828073 6552[6552] 0100c3508000ab92 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab92 started for user dmadmin."
Thu Jan 02 07:05:15 2020 [INFORMATION] [LAUNCHER 6522] Detected while preparing job dm_bpm_XCPAutoTaskMgmt for execution: Agent Exec connected to server dmtest: [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab92 started for user dmadmin."
 
 
2020-01-02T07:05:15.714803 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
2020-01-02T07:05:15.726601 1597[1597] 0100c3508000ab7f [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab7f is owned by user dmadmin now."
bpm_XCPAutoTaskMgmt Report For DocBase dmtest.dmtest As Of 2020/01/02 07:05:15
 
---------------------------------------
 

Then, from a second terminal, we’ll start and stop several idql sessions and observe the resulting output. We will notice the familiar lines *_START and *_QUIT from the session’s logs:

---------------------------------------
2020-01-02T07:09:16.076945 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect XCP automatic tasks to be processed:
select r_object_id from dmi_workitem where a_wq_name = 'to be processed by job' and r_auto_method_id != '0000000000000000' and (r_runtime_state = 0 or r_runtime_state = 1) order by r_creation_date asc
Report End 2020/01/02 07:09:16
2020-01-02T07:09:16.584776 7907[7907] 0100c3508000ab97 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab97 quit."
2020-01-02T07:09:44.969770 8080[8080] 0100c3508000ab98 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab98 started for user dmadmin."
2020-01-02T07:09:47.329406 8080[8080] 0100c3508000ab98 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab98 quit."
...

So, inotifywatch is pretty effective as a file watcher.

Let’se see how many tail processes are currently running:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 4818 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 4846 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 4850 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 8375 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 8389 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab99
1 8407 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt
1 8599 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080000386
1 8614 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9a
1 8657 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9b
1 8673 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/DataDictionaryPublisherDoc.txt

And after a while:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 4818 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 4846 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 4850 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 8599 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080000386
1 8614 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9a
1 8657 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9b
1 8673 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/DataDictionaryPublisherDoc.txt
1 8824 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080007042
1 8834 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9c
1 8852 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/bpm_XCPAutoTaskMgmtDoc.txt

Again:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 10058 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 10078 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 10111 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9f
1 10131 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 10135 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 10139 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt

And again:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 10892 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 10896 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 10907 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000aba1
1 10921 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 10925 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 10930 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt

And, after disabling, the aggregation of new logs:

$ echo 0 > $tail_on_off
$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 24676 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 24710 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 24714 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f

And eventually:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
<nothing>

From now on, until the aggregation is turned back on, this list will be empty as no new tail commands will be started.
The number of tail commands grows and shrinks as expected, so the process clean up is working.
The agentexec.log file is often here because jobs are started frequently and therefore this file regularly triggers the MODIFY event. dmadmin and sysadmin owned session logs come and go for each started job. The DataDictionaryPublisherDoc.txt, bpm_XCPAutoTaskMgmtDoc.txt and Initialize_WQDoc.txt are a few followed job’s logs.
In the used test system, an out of the box docbase without any running applications, the output is obviously very quiet with long pauses in between but it can become extremely dense in a busy system. When investigating problems in such systems, it can be useful to redirect the whole output to a text file and use one’s favorite editor to search it at leisure. It is also possible to restrict it to an as narrow as desired time window (if docker logs is used) in order to reduce its noise, exclude files from being watched (see the next paragraph) and even to stop aggregating new logs via the $tail_on_off file so only the currently ones are followed as long as they are active (i.e. written in).

Dynamically reconfiguring inotifywait via a parameter file

In the preceding examples, the inotifywait command has been passed a fixed, hard-coded subtree name to watch. In certain cases, it could be useful to be able to watch a different, random sub-directory or list of arbitrary files in unrelated sub-directories with a possible list of excluded files; for even more flexibility, the other inotifywait’s parameters could also be changed dynamically. In such cases, the running inotifywait command will have to be stopped and restarted to change its command-line parameters. One could imagine to check a communication file (like the $tail_on_off file above) inside the entrypoint’s main loop, as shown below. Here the diff is relative to the precedent static parameters version:

10a11,13
> # file that contains the watcher's parameters;
> export new_inotify_parameters_file=${watcher_workdir}/inotify_parameters_file
> 
15a19,20
>    inotify_params="$1"
>  
19c24
    eval $private_inotify ${inotify_params} $watcher_workdir | gawk -v tail_timeout=$((tail_timeout * 60)) -v excluded_files="$excluded_files" -v bMust_tail=1 -v tail_on_off=$tail_on_off -v heartbeat_file=$heartbeat_file -v env_private_tail=private_tail -v FS="|" -v Apo="'" 'BEGIN {
122a128,142
> process_new_inotify_parameters() {
>    # read in the new parameters from file $new_inotify_parameters_file;
>    # first line of the $new_inotify_parameters_file must contains the non-target parameters, e.g. -m -e create,modify,attrib -r --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T %e %w%f";
>    new_params=$(cat $new_inotify_parameters_file)
>    rm $new_inotify_parameters_file
>  
>    # kill current watcher;
>    pkill -f $private_inotify
> 
>    # kill the current private tail commands too;
>    pkill -f $private_tail
> 
>    # restart inotify with new command-line parameters taken from $new_inotify_parameters_file;
>    follow_logs "$new_params" &
> }
138,139c158,162
< # start the watcher;
 # default inotifywait parameters;
> # the creation of this file will trigger a restart of the watcher using the new parameters;
> cat - < $new_inotify_parameters_file
>    --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' \${DOCUMENTUM}/dba/log --exclude \$new_inotify_parameters_file
> eot
143a167
>    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters

To apply the patch, save the first listing into a file, e.g. inotify_static_parameters.sh, the above diff output into a file, e.g. static2dynamic.diff, and use the command below to create the script inotify_dynamic_parameters.sh:

patch inotify_static_parameters.sh static2dynamic.diff -o inotify_dynamic_parameters.sh

In order not to trigger events when it is created, the $new_inotify_parameters_file must be excluded from inotifywait’s watched directories.
If, for instance, we want later to exclude jobs’ logs in order to focus on more relevant logs, we could use the following inotifywait’s parameters:

# cat - <<-eot > $new_inotify_parameters_file
   --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" \${DOCUMENTUM}/dba/log --exclude '${DOCUMENTUM}/dba/log/dmtest/agentexec/job_.*' --exclude $new_inotify_parameters_file
eot

From the outside of a container:

docker exec <container> /bin/bash -c 'cat - < $new_inotify_parameters_file
   --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" \${DOCUMENTUM}/dba/log --exclude "\${DOCUMENTUM}/dba/log/dmtest/agentexec/job_.*" --exclude \$new_inotify_parameters_file
eot
'

where the new option −−exclude specifies a POSIX-compliant regular expression to be quoted so the shell does not perform its variable expansions.
After at most $pause_duration seconds, this file is detected (line 34), read (line 14), deleted (line 15) and the current watcher gets restarted in the background with the new parameters (line 24).
This approach is akin to polling the parameter file and it looks a bit unsophisticated. We use events to control the tailing through the $tail_on_off file and for the heartbeat. Why not for the parameter file ? The next paragraph will show just that.

Dynamically reconfiguring inotifywait via an event

A better way would be to use the watcher itself to check if the parameter file has changed ! After all, we shall put our money where our mouth is, right ?
In order to do this, and dedicated watcher is set up to watch the parameter file. For technical reasons (in order to watch a file, that file must already exist. Also, when the watched file is erased, inotifywait does not react to any events on that file any longer; it just sits there idly), instead of watching $new_inotify_parameters_file, it watches its parent directory. To avoid scattering files in too many locations, the parameter file will be stored along with the technical files in ${watcher_workdir} and, in order to avoid impacting the performance, it will be excluded from the main watcher (parameter –exclude \$new_inotify_parameters_file, do not forget to append it otherwise the main watcher will raise events for nothing; it should not impede the parameter file to be processed though).
When a MODIFY event on this file occurs, which includes a CREATE event if the file did not pre-exist, the event is processed as shown precedently.
Here the diff compared to the preceding polled parameter file version:

143a144,151
> # the parameter file's watcher;
> param_watcher() {
>    IFS="|"
>    inotifywait --quiet --monitor --event create,modify,attrib --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' $watcher_workdir --exclude \$tail_on_off \$heartbeat_file | while read timestamp event target_file; do
>       [[ $target_file == $new_inotify_parameters_file ]] && process_new_inotify_parameters
>    done
> }
> 
162a171,173
> 
> # starts the parameter file's watcher;
> param_watcher &

The delta is quite short with just the function param_watcher() to start the watcher on the parameter file and process its signal by invoking the same function process_new_inotify_parameters() introduced in the polling version.
To apply the patch, save the above diff output into a file, e.g. dynamic2event.diff, and use the command below to create the script inotify_dynamic_parameters_event.sh from the polling version’s script inotify_dynamic_parameters.sh created above:

patch inotify_dynamic_parameters.sh dynamic2event.diff -o inotify_dynamic_parameters_event.sh

The dedicated watcher runs in the background and restarts the main watcher whenever the latter receives new parameters. Quite simple.

Dynamically reconfiguring inotifywait via signals

Yet another way to interact with inotifywait’s script (e.g. the container’s entrypoint) could be through signals. To this effect, we’ll need first to choose suitable signals, say SIGUSR[12], define a trap handler and implement it. And while we are at it, let’s also add switching the watcher on and off, as illustrated below:

2a3,5
> trap 'bounce_watcher' SIGUSR1
> trap 'toggle_watcher' SIGUSR2
> 
7a11,14
> # new global variable to hold the current tail on/off status;
> # used to toggle the watcher;
> export bMust_tail=1
> 
144,149c151,162
< # the parameter file's watcher;
< param_watcher() {
<    IFS="|"
<    inotifywait --quiet --monitor --event create,modify,attrib --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' $watcher_workdir --exclude \$tail_on_off \$heartbeat_file | while read timestamp event target_file; do
<       [[ $target_file == $new_inotify_parameters_file ]] && process_new_inotify_parameters
 # restart the watcher with new parameters;
> bounce_watcher() {
>    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters
>    echo "parameters reloaded"
> }
> 
> # flip the watcher on/off;
> # as file $tail_on_off is watched, modifying it will trigger the processing of the boolean bMust_tail in the gawk script;
> toggle_watcher() {
>    (( bMust_tail = (bMust_tail + 1) % 2 ))
>    echo $bMust_tail > $tail_on_off
>    echo "toggled the watcher, it is $bMust_tail now"
172,173c185,187
< # starts the parameter file's watcher;
 process_new_inotify_parameters
> 
> echo "process $$ started"
178d191
<    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters

To apply the patch, save the above diff output into a file, e.g. event2signals.diff, and use the command below to create the script inotify_dynamic_parameters_signals.sh from the event version’s script inotify_dynamic_parameters_event.sh created above:

patch inotify_dynamic_parameters_events.sh event2signals.diff -o inotify_dynamic_parameters_signals.sh

The loop is simpler now as the functions are directly invoked by outside signals.
To use it, just send the SIGUSR[12] signals to the container, as shown below:

# write a new configuration file:
...
# and restart the watcher:
$ docker kill --signal SIGUSR1 <container>
 
# toggle the watcher;
$ docker kill --signal SIGUSR2 <container>

If testing outside a container, the commands would be:

/bin/kill --signal SIGUSR1 pid
# or:
/bin/kill --signal SIGUSR2 pid

where pid is the pid displayed when the script starts.
We won’t indulge more on signals, see How to stop Documentum processes in a docker container, and more (part I) for many more examples of using signals to talk to containers.

Conclusion

inotifywait is simple to configure and extremely fast, pratically instantaneous, at sending notifications. Although the aggregated output looks a bit confusing when the system is loaded, there are several ways to reduce its volume to make it easier to use. It is an interesting addition to an administrator to help troubleshooting repositories and their client applications. With suitable parameters, it can be used to support any containerized or traditional installations. It is one of those no frills tools that just work (mostly) as expected. If you ever happen to need a file watcher, consider it, you won’t be disappointed and it is fun to use.

Cet article Tracking Logs Inside a Documentum Container (part II) est apparu en premier sur Blog dbi services.

EDB PEM – Monitor your Postgres cluster

$
0
0

In my last post, I explained, how to setup a HA Postgres Cluster using EDB Postgres Advanced Server and Failover Manager. As a next step, we want to install the EDB Postgres Enterprise Manager to monitor what we setup before.

Introduction

There are, of course, many good tools to monitor your Postgres Cluster, but in case you run your Postgres Cluster using EDB tool, you should really think about using EDB Postgres Enterprise Manager . It allows you to monitor EDB Postgres Advanced Server Clusters as well as open source Postgres Clusters. But PEM is not only a nice GUI, it alerts and you can tune your clusters from one single point.

Installation

Before you start with the Installation of PEM, you need a PostgreSQL cluster on a host.
Once you have it, you can go on with the installation of edb-pem-server.

$ sudo yum install wxBase mod_wsgi mod_ssl edb-pem-server -y

Afterwards you need to set a password for the enterprisedb user in your cluster

$ psql
postgres=# alter user enterprisedb with password '******';
ALTER ROLE
postgres=# \q

That’s it. So let’s go on with the configuration.

Configuration

EDB delivers the PEM Server with a nice script to configure the server. This makes it really easy!

$ sudo /usr/edb/pem/bin/configure-pem-server.sh
------------------------------------------------------
 EDB Postgres Enterprise Manager
 -----------------------------------------------------
Install type: 1:Web Services and Database, 2:Web Services 3: Database [ ] :1
Enter local database server installation path (i.e. /usr/edb/as10 , or /usr/pgsql-10, etc.) [ ] :/usr/edb/as11
Enter database super user name [ ] :enterprisedb
Enter database server port number [ ] :5444
Enter database super user password [ ] :
Please enter CIDR formatted network address range that agents will connect to the server from, to be added to the server's pg_hba.conf file. For example, 192.168.1.0/24 [ 0.0.0.0/0 ] :192.168.22.53/32
Enter database systemd unit file or init script name (i.e. edb-as-10 or postgresql-10, etc.) [ ] :edb-as-11
Please specify agent certificate path (Script will attempt to create this directory, if it does not exists) [ ~/.pem/ ] :
CREATE EXTENSION
[Info] Configuring database server.
CREATE DATABASE
..
..
[Info] Configured the webservice for EDB Postgres Enterprise Manager (PEM) Server on port '8443'.
[Info] PEM server can be accessed at https://192.168.22.53:8443/pem at your browser

Now you can try to access PEM with your Webbrowser.

Agent installation and configuration

Once the PEM Server is setup, you need to install the edb-pem-agent on all hosts, which you want to monitor.

$ sudo yum install edb-pem-agent -y

Be sure to have the pg_hba.conf entries correct on the Agent and PEM Node.
As soon as the installation is finished, you can register the agent to the PEM Server. For this you need the IP, the port and the user for PEM.

$ sudo bash
$ export PEM_MONITORED_SERVER_PASSWORD=edb
$ export PEM_SERVER_PASSWORD=edb
$ /usr/edb/pem/agent/bin/pemworker --register-agent --pem-server 192.168.22.53 --pem-port 5400 --pem-user enterprisedb --allow_server_restart true --allow-batch-probes true --batch-script-user enterprisedb 

As last step you need to add the configuration to the agent.cfg and start/enable the pemagent service.

$ echo "allow_streaming_replication=true" >> /usr/edb/pem/agent/etc/agent.cfg
$ echo "ca_file=/usr/libexec/libcurl-pem/share/certs/ca-bundle.crt" >> /usr/edb/pem/agent/etc/agent.cfg
$ systemctl enable pemagent
$ systemctl start pemagent

Now you can have a look at the PEM Dashboard and you will see the registered agents.

As a next step you can add the PostgreSQL clusters to PEM, as I already explained that here I want go into this here.

Conclusion

Now you can enjoy the full monitoring experience using PEM. As well as all the nice put into graphs.

Cet article EDB PEM – Monitor your Postgres cluster est apparu en premier sur Blog dbi services.

Setup a two node Postgres HA Cluster using EDB Advanced Server and EFM

$
0
0

Some time ago I had a customer looking for a two node PostgreSQL Master/Replica solution. As we need Oracle compatibility in a later step, we decided to go with the EnterpriseDB tools. This article should give you an introduction on how to setup the environment.

Prerequisites

There are just some few things, that you need to prepare.
You need (at least) three servers with:

  • EPEL repository available
  • Subscription for EDB
  • EDB repository available

To make everything working with our DMK some folders and links are needed:

mkdir -p /u01/app/postgres/product/as11
mkdir -p /u01as11
mkdir -p /usr/edb
mkdir -p /u02/pgdata/11/PG1
ln -s /u02/pgdata/11/PG1/ /u01as11/data
ln -s /u01/app/postgres/product/as11/ /usr/edb/as11
yum install -y unzip xorg-x11-xauth screen

EDB Advanced Server

Installation

Let’s start with the installation of the EDB Advanced Server This is really straight forward:

$ yum install edb-as11-server
$ chown enterprisedb:enterprisedb /u02/pgdata/11/epg1/
$ chown -R enterprisedb:enterprisedb /u01/app/
$ rm -rf /u01as11/backups/
$ passwd enterprisedb

Now you can install and configure our DMK. Make sure to adjust var::PGUSER::=::nooption::”enterprisedb” in the dmk.conf.

As soon as the installation is done, you can initialize a new primary cluster.

enterprisedb@edb1:/var/lib/edb/ [PG1] cat /etc/pgtab
PG1:/u01/app/postgres/product/as11/:/u02/pgdata/11/PG1/:5444:Y
enterprisedb@edb1:/var/lib/edb/ [PG1] dmk
enterprisedb@edb1:/var/lib/edb/ [pg950] PG1

********* dbi services Ltd. ****************

STATUS           : CLOSED

********************************************
enterprisedb@ad1:/var/lib/edb/ [PG1] sudo mkdir -p /u02/pgdata/PG1
enterprisedb@ad1:/var/lib/edb/ [PG1] sudo chown enterprisedb:enterprisedb /u02/pgdata/PG1
enterprisedb@ad1:/var/lib/edb/ [PG1] /u01/app/postgres/product/as11/bin/initdb --pgdata=/u02/pgdata/PG1/ --pwprompt --data-checksums --auth=md5
The files belonging to this database system will be owned by user "enterprisedb".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

Enter new superuser password:
Enter it again:

fixing permissions on existing directory /u02/pgdata/PG1 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Berlin
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
creating edb sys ... ok
loading edb contrib modules ...
edb_redwood_bytea.sql ok
edb_redwood_date.sql ok
dbms_alert_public.sql ok
dbms_alert.plb ok
dbms_job_public.sql ok
dbms_job.plb ok
dbms_lob_public.sql ok
dbms_lob.plb ok
dbms_output_public.sql ok
dbms_output.plb ok
dbms_pipe_public.sql ok
dbms_pipe.plb ok
dbms_rls_public.sql ok
dbms_rls.plb ok
dbms_sql_public.sql ok
dbms_sql.plb ok
dbms_utility_public.sql ok
dbms_utility.plb ok
dbms_aqadm_public.sql ok
dbms_aqadm.plb ok
dbms_aq_public.sql ok
dbms_aq.plb ok
dbms_profiler_public.sql ok
dbms_profiler.plb ok
dbms_random_public.sql ok
dbms_random.plb ok
dbms_redact_public.sql ok
dbms_redact.plb ok
dbms_lock_public.sql ok
dbms_lock.plb ok
dbms_scheduler_public.sql ok
dbms_scheduler.plb ok
dbms_crypto_public.sql ok
dbms_crypto.plb ok
dbms_mview_public.sql ok
dbms_mview.plb ok
dbms_session_public.sql ok
dbms_session.plb ok
edb_bulkload.sql ok
edb_gen.sql ok
edb_objects.sql ok
edb_redwood_casts.sql ok
edb_redwood_strings.sql ok
edb_redwood_views.sql ok
utl_encode_public.sql ok
utl_encode.plb ok
utl_http_public.sql ok
utl_http.plb ok
utl_file.plb ok
utl_tcp_public.sql ok
utl_tcp.plb ok
utl_smtp_public.sql ok
utl_smtp.plb ok
utl_mail_public.sql ok
utl_mail.plb ok
utl_url_public.sql ok
utl_url.plb ok
utl_raw_public.sql ok
utl_raw.plb ok
commoncriteria.sql ok
waitstates.sql ok
installing extension edb_dblink_libpq ... ok
installing extension edb_dblink_oci ... ok
installing extension pldbgapi ... ok
snap_tables.sql ok
snap_functions.sql ok
dblink_ora.sql ok
sys_stats.sql ok
finalizing initial databases ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /u01/app/postgres/product/as11/bin/pg_ctl -D /u02/pgdata/PG1/ -l logfile start

enterprisedb@ad1:/var/lib/edb/ [PG1]

Than adjust wal_keep_segments and afterwards the edb-as service can be enabled and started.

$ echo "wal_keep_segments=100" >> $PGDATA/postgresql.auto.conf
$ sudo systemctl enable edb-as-11.service
$ sudo systemctl start edb-as-11

To be sure everything works as expected, reboot the server (if possible).

All above steps should also be done on your additional nodes, but without the systemctl start.

Configuration

First, on Node 1 (Master) you need to create the replication role.

postgres=# create role replication with REPLICATioN PASSWORD 'replication' login;
CREATE ROLE

Second, you need to add replication to pg_hba.conf.

local   replication    all             127.0.0.1/32            trust
host    replication    all             192.168.22.53/32        trust
host    replication    all             192.168.22.51/32        trust
host    replication    all             192.168.22.52/32        trust
host    replication    all             ::1/128                 trust

And last but not least, your should exchange the ssh-key of all nodes:

enterprisedb@edb1:/u01 [PG1] ssh-keygen
enterprisedb@edb1:/u01 [PG1] ssh-copy-id enterprisedb@edb1
enterprisedb@edb1:/u01 [PG1] ssh-copy-id enterprisedb@edb2
enterprisedb@edb1:/u01 [PG1] ssh-copy-id enterprisedb@edb3

Create the replica

As already mentioned, you need almost all steps done on Node 2 as well, but without starting the service.
Make sure to have all hosts in pg_hba.conf of Master.

To create the replication create a pg_basebackup into Node 2:

enterprisedb@edb2:/u01 [PG1] pg_basebackup -h 192.168.22.51 -U replication -p 5432 -D $PGDATA -Fp -Xs -P -R
49414/49414 kB (100%), 1/1 tablespace

Once finish, check if the recovery.conf is available and add the following lines:

enterprisedb@edb2:/u01 [PG1] echo "recovery_target_timeline = 'latest'" >> /u02/pgdata/11/PG1/recovery.conf
enterprisedb@edb2:/u01 [PG1] echo "trigger_file='/u02/pgdata/11/PG1/trigger_failover'" >> /u02/pgdata/11/PG1/recovery.conf

To test, if the recovery is working, start the cluster and check the recovery status.

enterprisedb@edb2:/u01 [PG1] pgstart
enterprisedb@edb2:/u01 [PG1] psql -U enterprisedb -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery
-------------------
 t
(1 row)

enterprisedb@edb2:/u01 [PG1] sudo systemctl enable edb-as-11.service
enterprisedb@edb2:/u01 [PG1] pgstop
enterprisedb@edb2:/u01 [PG1] systemctl start edb-as-11

EDB Postgres Failover Manager (EFM)

To make our two Node setup High Available, we need to install the EDB Postgres Failover Manager on three nodes. On the both installed with the Master / Replica and on a third one as a witness server.

Installation

Installation for EFM is straight forward as well, therefore your have to do the following steps on all three nodes. To use EFM toghether with our DMK, you need to create some links.

$ sudo yum install edb-efm37
$ sudo yum install java-1.8.0-openjdk
$ sudo chown -R enterprisedb:enterprisedb /etc/edb/efm-3.7/
$ cat /etc/edb/efm-3.7/efm.nodes
$ sudo ln -s /usr/edb/efm-3.7 /usr/edb/efm
$ sudo ln -s /etc/edb/efm-3.7 /etc/edb/efm

Configuration

On the master you have to set a password for the enterprisedb user and encrypt the password using EFM.

$ psql -U enterprisedb -c "alter user enterprisedb with password '*******'" postgres
$ /usr/edb/efm/bin/efm encrypt efm

The enrypted password generated by efm encrypt will be needed in the efm.properties files

As next step we need an efm.properties file on Node 1 and 2 (parameters to adjust below).

$ cp /u01/app/postgres/local/dmk/templates/postgres/efm.properties.in /etc/edb/efm-3.7/efm.properties
$ vi /etc/edb/efm-3.7/efm.properties
  db.user=enterprisedb
  db.password.encrypted=f17db6033ef1be48ec1955d38b4c9c46
  db.port=5400
  db.database=postgres
  db.bin=/u01/app/postgres/product/as11/bin
  db.recovery.dir=/u02/pgdata/11/EPAS
  bind.address=192.168.22.51:7800
  admin.port=7809
  is.witness=false 
  virtualIp=192.168.22.55
  virtualIp.interface=enp0s8
  virtualIp.prefix=24
  virtualIp.single=true
$ sudo chown enterprisedb:enterprisedb /etc/edb/efm/efm.properties

We also need a efm.nodes file to have all nodes of the cluster.

$ cat /etc/edb/efm/efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.22.51:7800 192.168.22.52:7800 192.168.22.53:7800
$ chown efm:efm efm.nodes
$ chmod 660 /etc/edb/efm/efm.nodes

To conclude, enable and start the efm-3.7.service.

sudo systemctl enable efm-3.7.service
sudo systemctl start efm-3.7.service

On node 3 we need to create a efm.properties file as well, but we need the efm.properties_witness file of dmk as draft.

$ cp /u01/app/postgres/local/dmk/templates/postgres/efm.properties_witness /etc/edb/efm-3.7/efm.properties

Adjust the parameters as shown in the step for node 1 and 2, but be careful to have:

is.witness=true

Afterwards start the efm-3.7 service on node 3 as well.

$ sudo systemctl start efm-3.7.service

In the end, you can check if everything is running as expected using EFM.

$ efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Standby     192.168.22.51        UP     UP       192.168.22.55
        Master      192.168.22.52        UP     UP       192.168.22.55*
        Witness     192.168.22.53        UP     N/A      192.168.22.55

Allowed node host list:
        192.168.22.52 192.168.22.51 192.168.22.53

Membership coordinator: 192.168.22.52

Standby priority host list:
        192.168.22.51

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Master      192.168.22.52                           0/110007B8
        Standby     192.168.22.51        0/110007B8         0/110007B8

        Standby database(s) in sync with master. It is safe to promote.

That’s it, now you have a Master/Replica system using EDB tools.
In a next step we will have a look at the setup of the cluster monitoring using EDB Enterprise Manager.

Cet article Setup a two node Postgres HA Cluster using EDB Advanced Server and EFM est apparu en premier sur Blog dbi services.


ROLLBACK TO SAVEPOINT;

$
0
0

By Franck Pachot

.
I love databases and, rather than trying to compare and rank them, I like to understand their difference. Sometimes, you make a mistake and encounter an error. Let’s take the following example:
create table DEMO (n int);
begin transaction;
insert into DEMO values (0);
select n "after insert" from DEMO;
update DEMO set n=1/n;
select n "after error" from DEMO;
commit;
select n "after commit" from DEMO;

The “begin transaction” is not valid syntax in all databases because transactions may be started implicitly, but the other statements are valid syntax in all the common SQL databases. They all raise an error in the update execution because there’s one row with N=0 and then we cannot calculate 1/N as it is a math error. But, what about the result of the last select?

If I run this with Oracle, DB2, MS SQL Server, My SQL (links go to example in db<>fiddle), the row added by the insert is always visible by my session: after the insert, of course, after the update error, and after the commit (then visible by everybody).

The same statements run with PostgreSQL have a different result. You cannot do anything after the error. Only rollback the transaction. Even if you “commit” it will rollback. 

Yes, no rows are remaining there! Same code but different result.

You can have the same behavior as the other databases by defining a savepoint before the statement, and rollback to savepoint after the error. Here is the db<>fiddle. With PostgreSQL you have to define an explicit savepoint if you want to continue in your transaction after the error. Other databases take an implicit savepoint. By the way, I said “statement” but here is Tanel Poder showing that in Oracle the transaction is actually not related to the statement but the user call: Oracle State Objects and Reading System State Dumps Hacking Session Video – Tanel Poder’s blog

In Oracle, you can run multiple statements in a user call with a PL/SQL block. With PostgreSQL, you can group multiple statements in one command but you can also run a PL/pgSQL block. And with both, you can catch errors in the exception block. And then, it is PostgreSQL that takes now an implicit savepoint as I explained in a previous post: PostgreSQL subtransactions, savepoints, and exception blocks

This previous post was on Medium ( you can read https://www.linkedin.com/pulse/technology-advocacy-why-i-am-still-nomad-blogger-franck-pachot/ where I explain my blog “nomadism”), but as you can see I’m back on the dbi-services blog for my 500th post there. 

My last post here was called “COMMIThttps://blog.dbi-services.com/commit/ where I explained that I was quitting consulting for CERN to start something new. But even if I decided to change, I was really happy at dbi-services (as I mentioned on a LinkedIn post about great places to work). And when people like to work together it creates an implicit SAVEPOINT where you can come back if you encounter some unexpected results. Yes… this far-fetched analogy just to mention that I’m happy to come back to dbi services and this is where I’ll blog again.

As with many analogies, it reaches the limits of the comparison very quickly. You do not ROLLBACK a COMMIT and it is not a real rollback because this year at CERN was a good experience. I’ve met great people there, learned interesting things about matter and anti-matter, and went out of my comfort zone like co-organizing a PostgreSQL meetup and inviting external people ( https://www.linkedin.com/pulse/working-consultants-only-externalization-franck-pachot/) for visits and conferences. 

This “rollback” is actually a step further, but back in the context I like: solve customer problems in a company that cares about its employees and customers. And I’m not exactly coming back at the same “savepoint”. I was mostly focused on Oracle and I’m now covering more technologies in the database ecosystem. Of course, consulting on Oracle Database will still be a major activity. But today, many other databases are raising: NoSQL, NewSQL… Open Source is more and more relevant. And in this jungle, the replication and federation technologies are raising. I’ll continue to share on these areas and you can follow this blog, the RSS feed, and/or my twitter account.

Cet article ROLLBACK TO SAVEPOINT; est apparu en premier sur Blog dbi services.

Should I go for ODA 19.5 or should I wait until 19.6?

$
0
0

Introduction

As you may know, Oracle Database 19c is available for new (X8-2) or older Oracle Database Appliances since several weeks. Current version is 19.5. But when you go to the official ODA documentation , it still first proposes version 18.7 not compatible with 19c databases. Here is why.

19c database is the final 12.2

First of all, 19c is an important release because it’s the terminal release of the 12.2, as 11.2.0.4 was for 11.2. Please refer to my other blog to understand the new Oracle versioning. ODA always supports new releases few months after being available on Linux, and it’s why it’s only available now.

Drawbacks of 19.5

19.5 is available on your ODA, but you will not be able to patch to this version. Reason is quite simple, it’s not a complete patch, you can only download ISO for reimaging and 19c grid and database software and that’s it. The reason for not yet having a patch resides in the difficulty of updating the OS part. 19.5 runs on Linux 7.7, and all previous releases are stuck with Linux 6.10, meaning that the patch should include the OS upgrade, and this jump is not so easy. It’s the first drawback.

Second drawback is that you cannot run another database version. If you still need 18c, 12.2, 12.1 or 11.2, this 19.5 is not for you.

The third drawback is that you will not be able to patch from 19.5 to 19.6 or newer version. Simply because 19.5 is an out of the way release.

Another drawback concerns the documentation not yet complete: many parts are copy/paste from 18.7. For example, described initcl command to restart the dcs agent is not a command that actually exists on Linux 7.

Moreover, my first tests on this version show annoying bugs related to database creation, those under investigation by Oracle.

When 19.6 will be ready?

19.6 is planned for 2020, yes but which month? There is no official date, it could come in march, or during the summer, nobody knows. As a result, you will have to wait for this patch to be released to start your migration to 19c on ODA.

So, what to do?

3 solutions are possible:

  • You can deal with your old databases until the patch is released: buy extended support for 11gR2/12cR1. Premier support is still OK for 12.2.0.1 and 18c
  • Migrate your old 11gR2 and 12cR1 to 18c to be prepared for 19c and avoid buying extended support, differences between 18c and 19c should be minimal
  • Deploy 19.5 for testing purpose on a test ODA and start your migration project to get prepared for 19.6. Once available, patch or redeploy your ODAs and migrate all your databases

Conclusion

Not having 19.6 now is really annoying. Afterall we choose ODA because it’s easier to get updates. But you can still prepare everything for 19c migration, by first migrate to 18c or give a try to 19c with this 19.5 release.

Cet article Should I go for ODA 19.5 or should I wait until 19.6? est apparu en premier sur Blog dbi services.

NVMe the afterburner for your database

$
0
0

Over 1 million IOPS (@8 KByte) and more than 26 GByte/s (@1MByte): Read more to see all impressive benchmark figures from a server with 14 NVMe drives and read why this is still not the best you could get…


End of last year, I have gotten a call from Huawei. They (finally) agreed to provide me a server with their enterprise NVMe drives for performance testing.
To say that I was delighted is a huge understatement. It felt like an 8-year old waiting for Christmas to get his present.

Choosing the right hardware for a database server is always important and challenging. Only if you build a rock-solid, stable and performant base you can build a reliable database service with predictable performance. Sounds expensive and most of the time, it is, but NVMe drives can be a game-changer in this field.

After a very nice and friendly guy from Huawei delivered this server to me, I immediately started to inspect this wonderful piece of hardware I’ve got.

Testserver

Huawei provided an H2288 2 Socket Server with 15x 800GByte NVMe drives (ES3600P V5 800 GBytee PCI Gen3) to me.

The server would be able to handle 25 NVMe drives but I think I’ve got every drive, which was available at this time 🙂

All drives have a U.2 PCIe connector. The drives are connected over two PCIe Gen3 16x adapter card with 4 cables on each card (this information will get important later…)

The PCIe cards provide the PCIe lanes for the NVMe drives. These cards are just passthrough cards without any RAID function at all.
So all your RAID configuration has to be done within a software raid (e.g. ASM, LVM, etc.)
The drives are also available with the sizes of 1-6.4 TByte.

Expected performance

As often with SSD/NVMe drives the performance is tied to the size of the drive. We are testing the smallest drive from the series with the lowest specs. But “low” depends on your angle of view.

The 800 GByte drive has a spec performance of
420k IOPS read @ 4KByte blocksize
115k IOPS write @ 4KByte blocksize
3500 IOPS read @ 1MByte blocksize
1000 IOPS write @ 1MByte blocksize

Because we are testing with the Oracle default blocksize we can translate that to
210k IOPS read
57k IOPS write @ 8KByte blocksize PER DRIVE!

The bigger drives have even higher specs as you can see in the picture.

Even more impressive is the specified read and write latency which is 0.088ms and 0.014ms

Of course, these numbers are a little bit like the fuel usage numbers in a car commercial: Impressive but mostly not true. At least when you exceed the cache size of your drive your performance will drop. How much? The interesting question is: How big is the drop?

Test setup

I used an Oracle Linux 7.7 OS and ext4 filesystem with fio for benchmarking tests as well as Oracle 19c with ASM and calibrate_io to prove the result.

Because I used 1 drive for the OS I had “only” 14 drives for testing the performance.

Please see my other blog post Storage performance benchmarking with FIO for detailed information about the fio tests.

Test results with fio

Because SSD/NVMe drives tend to be faster when empty, all drives were first filled 100% before the performance measuring started.

The test was made with 1 drive and 1 thread up to 14 drives and 10 threads per drive.

8KByte random read/write

For random read, the results of the test start at 12k IOPS for 1 drive and 1 thread up 1.186 million IOPS (roughly 9.5GByte/s!!) with 14 drives and 10 threads per drive. The latency was between 82 and 117 microseconds

Random write tests started even more impressive with 64.8k IOPS with 1 drive and 1 thread up to 0.956 million IOPS (roughly 7.6GByte/s!!) at max.
The write latency was between 14 and 145 microseconds which is absolutely insane, especially compared to the latency you can expect from an all-flash storage system with an FC network in between.

8KByte sequential read/write

The numbers for 8KB sequential read and write are even better but in the life of a DB server a little bit less important.

1MByte sequential read/write

Let us talk about the 1MByte blocksize results.
Here we are starting around 1GByte/s with 1 drive and 1 thread up to 26.9 GByte/s with 14 drives and 10 thread. The max value was already reached with 13 drive and 6 threads per drive.
Latency also looks good with 1-5ms which is a good value for IO of this blocksize.

The 2x 16x PCIe Gen3 adapter cards are here the bottleneck. We hit the maximum throughput of the bus. The drives could deliver more performance than the bus and this with just 14 of the possible 25 drives!

The number for sequential writes are not as high as for sequential reads but still quite impressive. We reached 0.656 GByte with 1 drive/1thread up to a maximum speed of 12GByte/s.
The latency is also good with 1.5-12ms.

If you wanna compare these numbers with an all-flash SAN storage go to my blog post about the Dorado 6000 V3 Benchmark.

Verify the results with a secondary tool

To verify the results of the filesystem test with a secondary tool, created a database with all the 14 drives in one ASM diskgroup with external redundancy.

dbms_resource_manager.calibrate_io supports the fio results with 1.35 million IOPS and 25 Gbyte/s throughput. Not exactly the same numbers, but close enough to prove the fio results. Not only we added ASM as an additional layer instead of LVM, but we also tested within the database instead of directly on the OS layer.

Even more speed possible?

There is not a clear answer:
If we could switch to bigger drives and higher capacity of the drives we could expect up to 4 times more IOPS for 8Kbyte (2x more for the double numbers of drives and 2x more for the better specs with the bigger drives).
BUT the controller will be the limitation. We already hit the limit of the 2xPCIe 16x Gen3 bus with 1MByte sequential read in this test and will possibly do so for the other tests when we are testing with more drives.

The server has at least 4 additional PCIe slots but all with only 8 lanes. So here is a bottleneck we are not able to solve in a server with Intel CPUs.

AMD already switched to PCI Gen4 (2x bandwidth then PCIe Gen3) and much more PCI lanes at with their CPUs, but most of the enterprise servers are Intel-based and even when AMD based servers should be available they are rarely seen at customer’s side (at least by my customers).

Intel also has a roadmap for providing PCIe Gen4 but the CPUs will be available earliest end of this year ;-(

Conclusion

This setup is freaking fast! Not only the throughput numbers are astonishing, but also the latency is very impressive. Even a modern storage system has a hard time to provide such numbers and has almost no chance to get down to this latency level.
Not talking about the fact, that a storage system eventually can hit these numbers but you need a 200GBit/s connection to get it down to your server…
With local NVMe drives you can get these numbers out of every box, not sharing with anybody!

As my colleague has written very well in his blog post Make Oracle Database simple again I could not agree more: Do not use SAN when it is not necessary, especially when your other option is local NVMe drives!
As long as you are not using Oracle RAC, you’ve got everything you need for your high availability with Data Guard or other HA solutions. Combine it with Oracle Multitenant and snapshot standbys: Ready is your high-performance database platform to consolidate all your databases on.

Skip the SAN Storage and put the saved money in some additional NVMe drives and set your database to hyperspeed.
When choosing the right CPUs with the needed numbers of Cores and the best available clock speed you can build a beast of a database server (not only for Oracle databases…).

Talking about money, the NVMe drives are getting cheaper almost every month and you can get a 6.4 TByte enterprise NVMe drive quite cheap today.

Of course, there is a downside: Scalability and RAID
At the moment your journey ends with 25×6.4 TByte drives (160 TBye raw / 80 TByte mirrored) capacity.
If you need more space, you have to go to another solution at least for the moment. I expect to see NVMe drives with >15 TByte this year and switch from PCIe Gen3 to Gen4 will bring even more performance for a single drive.

The other downside is the raid controller. Which just does not exist at the moment (as far as I know). You have to go with software RAID.
Not a big deal if you are using Oracle ASM, even when you just can do a RAID-1 where you are “loosing” 50% of your raw capacity. With DM/LVM you could do a software Raid-5. I can not tell you about the performance impact of that, because I just had not enough time to test this out.

But stay tuned, I am looking for a server with bigger and more drives for my next tests.

Cet article NVMe the afterburner for your database est apparu en premier sur Blog dbi services.

Running SQL Server on the Oracle Free tier

$
0
0

By Franck Pachot

The Oracle Cloud is not only for Oracle Database. You can create a VM running Oracle Linux with full root access to it, even in the free tier: a free VM that will be always up, never expires, with full ssh connectivity to a sudoer user, where you are able to tunnel any port. Of course, there are some limits that I’ve detailed in a previous post. But that is sufficient to run a database, given that you configure a low memory usage. For Oracle Database XE, Kamil Stawiarski mentions that you can just hack the memory test in the RPM shell script.
But for Microsoft SQL Server, that’s a bit more complex because this test is hardcoded in the sqlservr binary and the solution I propose here is to intercept the call to the sysinfo() system call.

Creating a VM in the Oracle Cloud is very easy, here are the steps in one picture:

I’m connecting to the public IP Address with ssh (the public key is uploaded when creating the VM) and I’ll will run everything as root:

ssh opc@129.213.138.34
sudo su -
cat /etc/oracle-release

I install docker engine (version 19.3 there)
yum install -y docker-engine

I start docker

systemctl start docker
docker info


I’ll use the latest SQL Server 2019 image built on RHEL
docker pull mcr.microsoft.com/mssql/rhel/server:2019-latest
docker images

5 minutes to download a 1.5GB image. Now trying to start it.
The nice thing (when I compare to Oracle) is that we don’t have to manually accept the license terms with a click-through process. I just mention that I have read and accepted them with: ACCEPT_EULA=Y 

I try to run it:
docker run \
-e "ACCEPT_EULA=Y" \
-e 'MSSQL_PID=Express' \
-p 1433:1433 \
-e 'SA_PASSWORD=**P455w0rd**' \
--name mssql \
mcr.microsoft.com/mssql/rhel/server:2019-latest

There’s a hardcoded prerequisite verification to check that the system has at least 2000 MB of RAM. And I have less than one GB here in this free tier:


awk '/^Mem/{print $0,$2/1024" MB"}' /proc/meminfo

Fortunately, there’s always a nice geek on the internet with an awesome solution: hack the sysinfo() system call with a LD_PRELOAD’ed wrapper : A Slightly Liberated Microsoft SQL Server Docker image

Let’s get it:
git clone https://github.com/justin2004/mssql_server_tiny.git
cd mssql_server_tiny

I changed the FROM to build from the 2019 RHEL image and I preferred to use /etc/ld.so.preload rather than overriding the CMD command with LD_LIBRARY:


FROM oraclelinux:7-slim AS build0
WORKDIR /root
RUN yum update -y && yum install -y binutils gcc
ADD wrapper.c /root/
RUN gcc -shared -ldl -fPIC -o wrapper.so wrapper.c
FROM mcr.microsoft.com/mssql/rhel/server:2019-latest
COPY --from=build0 /root/wrapper.so /root/
ADD wrapper.c /root/
USER root
RUN echo "/root/wrapper.so" > /etc/ld.so.preload
USER mssql

I didn’t change the wrapper for the sysinfo function:
#define _GNU_SOURCE
#include
#include
#include
int sysinfo(struct sysinfo *info){
// clear it
//dlerror();
void *pt=NULL;
typedef int (*real_sysinfo)(struct sysinfo *info);
// we need the real sysinfo function address
pt = dlsym(RTLD_NEXT,"sysinfo");
//printf("pt: %x\n", *(char *)pt);
// call the real sysinfo system call
int real_return_val=((real_sysinfo)pt)(info);
// but then modify its returned totalram field if necessary
// because sqlserver needs to believe it has "2000 megabytes"
// physical memory
if( info->totalram totalram = 1000l * 1000l * 1000l * 2l ;
}
return real_return_val;
}

I build the image from there:

docker build -t mssql .


I run it:

docker run -d \
-e "ACCEPT_EULA=Y" \
-e 'MSSQL_PID=Express' \
-p 1433:1433 \
-e 'SA_PASSWORD=**P455w0rd**' \
--name mssql \
mssql

I wait until it is ready:

until docker logs mssql | grep -C10 "Recovery is complete." ; do sleep 1 ; done

All is ok and I connect and check the version:

Well… as you can see my first attempt failed. I am running with very low memory here, and then many memory allocation problems can be expected. If you look at the logs after a while, many automatic system tasks fail. But that’s sufficient for a minimal lab and you can tweak some Linux and SQL Server parameters if you need it. Comments are welcome here for feedback and ideas…

The port 1433 is exposed here locally and it can be tunneled through ssh. This is a free lab environment always accessible from everywhere to do small tests in MS SQL, running on the Oracle free tier. Here is how I connect with DBeaver from my laptop, just mentioning the public IP address, private ssh key and connection information:

Cet article Running SQL Server on the Oracle Free tier est apparu en premier sur Blog dbi services.

How SQL Server MVCC compares to Oracle and PostgreSQL

$
0
0

By Franck Pachot

.
Microsoft SQL Server has implemented MVCC in 2005, which has been proven to be the best approach for transaction isolation (the I in ACID) in OLTP. But are you sure that writers do not block readers with READ_COMMITTED_SNAPSHOT? I’ll show here that some reads are still blocked by locked rows, contrary to the precursors of MVCC like PostgreSQL and Oracle.

For this demo, I run SQL Server 2019 RHEL image on docker in an Oracle Cloud compute running OEL7.7 as explained in the previous post. If you don’t have the memory limit mentioned, you can simply run:

docker run -d -e "ACCEPT_EULA=Y" -e 'MSSQL_PID=Express' -p 1433:1433 -e 'SA_PASSWORD=**P455w0rd**' --name mssql mcr.microsoft.com/mssql/rhel/server:2019-latest
time until docker logs mssql | grep -C10 "Recovery is complete." ; do sleep 1 ; done

Test scenario description

Here is what I’ll run in a first session:

  1. create a DEMO database
  2. (optional) set MVCC with Read Commited Snapshot isolation level
  3. create a DEMO table with two rows. One with “a”=1 and one with “a”=2
  4. (optional) build an index on column “a”
  5. update the first line where “a”=1


cat > session1.sql <<'SQL'
drop database if exists DEMO;
create database DEMO;
go
use DEMO;
go
-- set MVCC to read snapshot rather than locked current --
-- alter database DEMO set READ_COMMITTED_SNAPSHOT on;
go
drop table if exists DEMO;
create table DEMO(id int primary key, a int not null, b int);
begin transaction;
insert into DEMO values(1,1,1);
insert into DEMO values(2,2,2);
commit;
go
select * from DEMO;
go
-- index to read only rows that we want to modify --
-- create index DEMO_A on DEMO(a);
go
begin transaction;
update DEMO set b=b+1 where a=1;
go
SQL

I’ll run it in the background (you can also run it in another terminal) where it waits 60 seconds before quitting:

( cat session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( cat session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 27759
[root@instance-20200208-1719 ~]# Feb 09 17:05:43 drop database if exists DEMO;
Feb 09 17:05:43 create database DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 use DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 Changed database context to 'DEMO'.
Feb 09 17:05:43 -- set MVCC to read snapshot rather than locked current --
Feb 09 17:05:43 -- alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 17:05:43
Feb 09 17:05:43 drop table if exists DEMO;
Feb 09 17:05:43 create table DEMO(id int primary key, a int not null, b int);
Feb 09 17:05:43 begin transaction;
Feb 09 17:05:43 insert into DEMO values(1,1,1);
Feb 09 17:05:43 insert into DEMO values(2,2,2);
Feb 09 17:05:43 commit;
Feb 09 17:05:43
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
Feb 09 17:05:43 select * from DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 id          a           b
Feb 09 17:05:43 ----------- ----------- -----------
Feb 09 17:05:43           1           1           1
Feb 09 17:05:43           2           2           2
Feb 09 17:05:43
Feb 09 17:05:43 (2 rows affected)
Feb 09 17:05:43 -- index to read only rows that we want to modify --
Feb 09 17:05:43 -- create index DEMO_A on DEMO(a);
Feb 09 17:05:43
Feb 09 17:05:43 begin transaction;
Feb 09 17:05:43 update DEMO set b=b+1 where a=1;
Feb 09 17:05:43
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)

SQL Server default

While this session has locked the first row I’ll run the following, reading the same row that is currently locked by the other transaction:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go

This hangs until the first transaction is canceled:

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO | ts
-- read access the row that is not locked
 select * from DEMO where a=2;
go
Feb 09 17:06:42
Feb 09 17:06:42
Feb 09 17:06:42
Feb 09 17:06:42 Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C.
Feb 09 17:06:42
Feb 09 17:06:42 -- read access the row that is not locked
Feb 09 17:06:42  select * from DEMO where a=2;
Feb 09 17:06:42
Feb 09 17:06:42 id          a           b
Feb 09 17:06:42 ----------- ----------- -----------
Feb 09 17:06:42           2           2           2
Feb 09 17:06:42
Feb 09 17:06:42 (1 rows affected)

The “Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C” message is fron the first session and only then my foreground session was able to continue. This is the worst you can encounter with the default isolation level in SQL Server where writes and reads are blocking each other even when not touching the same row (I read the a=2 row and only the a=1 one was locked). The reason for this is that I have no index for this predicate and I have to read all rows in order to find mine:

set showplan_text on ;
go
select * from DEMO where a=2;
go

go
Feb 09 17:07:24 set showplan_text on ;
Feb 09 17:07:24
select * from DEMO where a=2;
go
Feb 09 17:07:30 select * from DEMO where a=2;
Feb 09 17:07:30
Feb 09 17:07:30 StmtText
Feb 09 17:07:30 -------------------------------
Feb 09 17:07:30 select * from DEMO where a=2;
Feb 09 17:07:30
Feb 09 17:07:30 (1 rows affected)
Feb 09 17:07:30 StmtText
Feb 09 17:07:30 ---------------------------------------------------------------------------------------------------------------------------------------------------
Feb 09 17:07:30   |--Clustered Index Scan(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F2AD8547F]), WHERE:([DEMO].[dbo].[DEMO].[a]=CONVERT_IMPLICIT(int,[@1],0)))
Feb 09 17:07:30
Feb 09 17:07:30 (1 rows affected)

Now, in order to avoid this situation, I’ll run the same but with an index on column “a”.
It was commented out in the session1.sql script and then I just re-ren everything without those comments:

( sed -e '/create index/s/--//' session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

I’m running the same, now with a 3 seconds timeout so that I don’t have to wait for my background session to terminate:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
 select * from DEMO where a=2;
 go
Feb 09 17:29:25 -- read access the row that is not locked
Feb 09 17:29:25  select * from DEMO where a=2;
Feb 09 17:29:25
Feb 09 17:29:25 Timeout expired

Here I’m blocked again like in the previous scenario because the index was not used.
I can force the index access with an hint:

-- read access the row that is not locked forcing index access
select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
go

-- read access the row that is not locked forcing index access
 select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
 go
Feb 09 17:29:30 -- read access the row that is not locked forcing index access
Feb 09 17:29:30  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:30
Feb 09 17:29:30 id          a           b
Feb 09 17:29:30 ----------- ----------- -----------
Feb 09 17:29:30           2           2           2
Feb 09 17:29:30
Feb 09 17:29:30 (1 rows affected)

This didn’t wait because the index access didn’t have to to to the locked row.

However, when I read the same row that is concurently locked I have to wait:

-- read access the row that is locked
select * from DEMO where a=1;
go

 -- read access the row that is locked
 select * from DEMO where a=1;
 go
Feb 09 17:29:34  -- read access the row that is locked
Feb 09 17:29:34  select * from DEMO where a=1;
Feb 09 17:29:34
Feb 09 17:29:34 Timeout expired

Here is the confirmation that the index was used only with the hint:

set showplan_text on ;
go
select * from DEMO where a=2;
go
select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
go

Feb 09 17:29:50 set showplan_text on ;
Feb 09 17:29:50
 select * from DEMO where a=2;
 go
Feb 09 17:29:50  select * from DEMO where a=2;
Feb 09 17:29:50
Feb 09 17:29:50 StmtText
Feb 09 17:29:50 --------------------------------
Feb 09 17:29:50  select * from DEMO where a=2;
Feb 09 17:29:50
Feb 09 17:29:50 (1 rows affected)
Feb 09 17:29:50 StmtText
Feb 09 17:29:50 --------------------------------------------------------------------------------------------------------------------------
Feb 09 17:29:50   |--Clustered Index Scan(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F102B4054]), WHERE:([DEMO].[dbo].[DEMO].[a]=(2)))
Feb 09 17:29:50
Feb 09 17:29:50 (1 rows affected)
 select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
 go
Feb 09 17:29:52  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:52
Feb 09 17:29:52 StmtText
Feb 09 17:29:52 -----------------------------------------------------
Feb 09 17:29:52  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:52
Feb 09 17:29:52 (1 rows affected)
Feb 09 17:29:52 StmtText                                                                                                                                                
Feb 09 17:29:52 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Feb 09 17:29:52   |--Nested Loops(Inner Join, OUTER REFERENCES:([DEMO].[dbo].[DEMO].[id]))                                                                              
Feb 09 17:29:52        |--Index Seek(OBJECT:([DEMO].[dbo].[DEMO].[DEMO_A]), SEEK:([DEMO].[dbo].[DEMO].[a]=(2)) ORDERED FORWARD)                                         
Feb 09 17:29:52        |--Clustered Index Seek(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F102B4054]), SEEK:([DEMO].[dbo].[DEMO].[id]=[DEMO].[dbo].[DEMO].[id]) LOOKUP ORDERED FORWARD)
Feb 09 17:29:52
Feb 09 17:29:52 (3 rows affected)

So, with de the default isolation level and index access, we can read a row that is not locked. The last query was blocked for the SELECT * FROM DEMO WHERE A=1 because we are in the legacy, and default, mode where readers are blocked by writers.

SQL Server MVCC

In order to improve this situation, Microsoft has implemented MVCC. With it, we do not need to read the current version of the rows (which requires waiting when it is concurrently modified) because the past version of the rows are stored in TEMPDB and we can read a past snapshot of it. Typically, with READ COMMITED SNAPSHOT isolation level, we read a snapshot as-of the point-in-time our query began. 
In general, we need to read all rows from a consistent point in time. This can be the one where our query started, and then while the query is running, a past version may be reconstructed to remove concurrent changes. Or, when there is no MVCC to rebuild this snapshot, this consistent point can only be the one when our query is completed. This means that while we read rows, we must lock them to be sure that they stay the same until the end of our query. Of course, even with MVCC there are cases where we want to read the latest value and then we will lock with something like a SELECT FOR UPDATE. But that’s not the topic here.

I’ll run the same test as the first one, but now have the database with READ_COMMITTED_SNAPSHOT on:

( sed -e '/READ_COMMITTED/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( sed -e '/READ_COMMITTED/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 38943
[root@instance-20200208-1719 ~]# Feb 09 18:21:19 drop database if exists DEMO;
Feb 09 18:21:19 create database DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 use DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 Changed database context to 'DEMO'.
Feb 09 18:21:19 -- set MVCC to read snapshot rather than locked current --
Feb 09 18:21:19  alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 18:21:19
Feb 09 18:21:19 drop table if exists DEMO;
Feb 09 18:21:19 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:21:19 begin transaction;
Feb 09 18:21:19 insert into DEMO values(1,1,1);
Feb 09 18:21:19 insert into DEMO values(2,2,2);
Feb 09 18:21:19 commit;
Feb 09 18:21:19
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)
Feb 09 18:21:19 select * from DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 id          a           b
Feb 09 18:21:19 ----------- ----------- -----------
Feb 09 18:21:19           1           1           1
Feb 09 18:21:19           2           2           2
Feb 09 18:21:19
Feb 09 18:21:19 (2 rows affected)
Feb 09 18:21:19 -- index to read only rows that we want to modify --
Feb 09 18:21:19 -- create index DEMO_A on DEMO(a);
Feb 09 18:21:19
Feb 09 18:21:19 begin transaction;
Feb 09 18:21:19 update DEMO set b=b+1 where a=1;
Feb 09 18:21:19
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)

And then running the same scenario:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is locked
select * from DEMO where a=1;
go
-- write access on the row that is not locked
delete from DEMO where a=2;
go

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts

-- read access the row that is not locked
select * from DEMO where a=2;
go
Feb 09 18:21:36 -- read access the row that is not locked
Feb 09 18:21:36 select * from DEMO where a=2;
Feb 09 18:21:36
Feb 09 18:21:36 id          a           b
Feb 09 18:21:36 ----------- ----------- -----------
Feb 09 18:21:36           2           2           2
Feb 09 18:21:36
Feb 09 18:21:36 (1 rows affected)

-- read access the row that is locked
select * from DEMO where a=1;
go
Feb 09 18:21:47 -- read access the row that is locked
Feb 09 18:21:47 select * from DEMO where a=1;
Feb 09 18:21:47
Feb 09 18:21:47 id          a           b
Feb 09 18:21:47 ----------- ----------- -----------
Feb 09 18:21:47           1           1           1
Feb 09 18:21:47
Feb 09 18:21:47 (1 rows affected)

-- write access on the row that is not locked
delete from DEMO where a=2;
go
Feb 09 18:22:01 -- write access on the row that is not locked
Feb 09 18:22:01 delete from DEMO where a=2;
Feb 09 18:22:01
Feb 09 18:22:01 Timeout expired

Ok, that’s better. I confirm that readers are not blocked by writers. But the modification on “A”=2 was blocked. This is not a writer-writer situation because we are not modifying the row that is locked by the other session. Here, I have no index on “A” and then the delete statement must first read the table and had to read this locked row. And obviously, this read is blocked. It seems that DML must read the current version of the row even when MVCC is available. That means that reads can be blocked by writes when those reads are in a writing transaction.

Last test on SQL Server: the same, with MVCC, and the index on “A”

( sed -e '/READ_COMMITTED/s/--//' -e '/create index/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( sed -e '/READ_COMMITTED/s/--//' -e '/create index/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 40320
[root@instance-20200208-1719 ~]#
[root@instance-20200208-1719 ~]# Feb 09 18:30:15 drop database if exists DEMO;
Feb 09 18:30:15 create database DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 use DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 Changed database context to 'DEMO'.
Feb 09 18:30:15 -- set MVCC to read snapshot rather than locked current --
Feb 09 18:30:15  alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 18:30:15
Feb 09 18:30:15 drop table if exists DEMO;
Feb 09 18:30:15 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:30:15 begin transaction;
Feb 09 18:30:15 insert into DEMO values(1,1,1);
Feb 09 18:30:15 insert into DEMO values(2,2,2);
Feb 09 18:30:15 commit;
Feb 09 18:30:15
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)
Feb 09 18:30:15 select * from DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 id          a           b
Feb 09 18:30:15 ----------- ----------- -----------
Feb 09 18:30:15           1           1           1
Feb 09 18:30:15           2           2           2
Feb 09 18:30:15
Feb 09 18:30:15 (2 rows affected)
Feb 09 18:30:15 -- index to read only rows that we want to modify --
Feb 09 18:30:15  create index DEMO_A on DEMO(a);
Feb 09 18:30:15
Feb 09 18:30:15 begin transaction;
Feb 09 18:30:15 update DEMO set b=b+1 where a=1;
Feb 09 18:30:15
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)

Here is my full scenario to see where it blocks:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is locked
select * from DEMO where a=1;
go
-- write access on the row that is not locked
delete from DEMO where a=2;
go
-- write access on the row that is locked
delete from DEMO where a=1;
go

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is not locked
select * from DEMO where a=2;

id          a           b
----------- ----------- -----------
          2           2           2

(1 rows affected)

-- read access the row that is locked
select * from DEMO where a=1;
go
-- read access the row that is locked
select * from DEMO where a=1;

id          a           b
----------- ----------- -----------
          1           1           1

(1 rows affected)

-- write access on the row that is not locked
delete from DEMO where a=2;
go
-- write access on the row that is not locked
delete from DEMO where a=2;


(1 rows affected)

-- write access on the row that is locked
delete from DEMO where a=1;
go
-- write access on the row that is locked
delete from DEMO where a=1;

Timeout expired

Finally, the only blocking situation here is when I want to write on the same row. The index access reduces the risk of being blocked.

In summary, we can achieve the best concurrency with READ_COMMITTED_SNAPSHOT isolation level, and ensuring that we read only the rows we will update, with proper indexing and maybe hinting. This is, in my opinion, very important to know because we rarely cover those situations during integration tests. But they can happen quickly in production with high load.

PostgreSQL

Let’s do the same with PostgreSQL which is natively MVCC:

cat > session1.sql <<'SQL'
drop database if exists DEMO;
create database DEMO;
\c demo
drop table if exists DEMO;
create table DEMO(id int primary key, a int not null, b int);
begin transaction;
insert into DEMO values(1,1,1);
insert into DEMO values(2,2,2);
commit;
select * from DEMO;
begin transaction;
update DEMO set b=b+1 where a=1;
SQL

No specific settings, and no index created here.

( cat session1.sql ; sleep 120 ; echo "commit;") | psql -e | ts &

-bash-4.2$ ( cat session1.sql ; sleep 120 ; echo "commit;") | psql -e | ts &
[1] 31125
-bash-4.2$
-bash-4.2$ Feb 09 18:42:48 drop database if exists DEMO;
Feb 09 18:42:48 DROP DATABASE
Feb 09 18:42:48 create database DEMO;
Feb 09 18:42:49 CREATE DATABASE
Feb 09 18:42:49 You are now connected to database "demo" as user "postgres".
Feb 09 18:42:49 drop table if exists DEMO;
NOTICE:  table "demo" does not exist, skipping
Feb 09 18:42:49 DROP TABLE
Feb 09 18:42:49 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:42:49 CREATE TABLE
Feb 09 18:42:49 begin transaction;
Feb 09 18:42:49 BEGIN
Feb 09 18:42:49 insert into DEMO values(1,1,1);
Feb 09 18:42:49 INSERT 0 1
Feb 09 18:42:49 insert into DEMO values(2,2,2);
Feb 09 18:42:49 INSERT 0 1
Feb 09 18:42:49 commit;
Feb 09 18:42:49 COMMIT
Feb 09 18:42:49 select * from DEMO;
Feb 09 18:42:49  id | a | b
Feb 09 18:42:49 ----+---+---
Feb 09 18:42:49   1 | 1 | 1
Feb 09 18:42:49   2 | 2 | 2
Feb 09 18:42:49 (2 rows)
Feb 09 18:42:49
Feb 09 18:42:49 begin transaction;
Feb 09 18:42:49 BEGIN
Feb 09 18:42:49 update DEMO set b=b+1 where a=1;
Feb 09 18:42:49 UPDATE 1

While the transaction updating the first row is in the background, I run the following readers and writers:

psql demo | ts
set statement_timeout=3000;
-- read access the row that is not locked
select * from DEMO where a=2;
-- read access the row that is locked
select * from DEMO where a=1;
-- write access on the row that is not locked
delete from DEMO where a=2;
-- write access on the row that is locked
delete from DEMO where a=1;

-bash-4.2$ psql demo | ts
set statement_timeout=3000;
Feb 09 18:43:00 SET
-- read access the row that is not locked
select * from DEMO where a=2;
Feb 09 18:43:08  id | a | b
Feb 09 18:43:08 ----+---+---
Feb 09 18:43:08   2 | 2 | 2
Feb 09 18:43:08 (1 row)
Feb 09 18:43:08
-- read access the row that is locked
select * from DEMO where a=1;
Feb 09 18:43:16  id | a | b
Feb 09 18:43:16 ----+---+---
Feb 09 18:43:16   1 | 1 | 1
Feb 09 18:43:16 (1 row)
Feb 09 18:43:16
-- write access on the row that is not locked
delete from DEMO where a=2;
Feb 09 18:43:24 DELETE 1
-- write access on the row that is locked
delete from DEMO where a=1;

ERROR:  canceling statement due to statement timeout
CONTEXT:  while deleting tuple (0,1) in relation "demo"

Nothing is blocked except, of course, when modifying the row that is locked.

Oracle Database

One of the many things I’ve learned from Tom Kyte when I was reading AskTom regularly is how to build the simplest test cases. And with Oracle there is no need to run multiple sessions to observe multiple transactions concurrency. I can do it with an autonomous transaction in one session and one advantage is that I can share a dbfiddle example:

Here, deadlock at line 14 means that only the “delete where a=1” encountered a blocking situation with “update where a=1”. All previous statements, select on any row and update of other rows, were executed without conflict.

A DML statement has two phases: one to find the rows and the second one to modify them. A DELETE or UPDATE in Oracle and Postgres runs the first in snapshot mode: non-blocking MVCC. The second must, of course, modify the current version. This is a very complex mechanism because it may require a retry (restart) when the current version does not match the consistent snapshot that was used for filtering. Both PostgreSQL and Oracle can ensure this write consistency without the need to block the reads. SQL Server has implemented MVCC more recently and provides non-blocking reads only for the SELECT reads. But a read can still be in blocking situation for the query phase of an update statement.

Cet article How SQL Server MVCC compares to Oracle and PostgreSQL est apparu en premier sur Blog dbi services.

Control-M : Import/Export Calendar

$
0
0

Control-M: Export/Import Calendars

What is a Control M Calendar?

A Control M calendar is used to schedule Control M jobs that are assigned to him:

For example the job named JOB_TEST will be executed depending of the days and rules defined in the calendar named CALENDAR_EXAMPLE_1 :

You can see that the job is scheduled following the calendars rules (by selecting View schedule)

Also you can list the calendars available in the tools tab of the GUI:

You can find your defined calendars following tools and selecting calendars tab,then you can perform some actions like duplicating or update them but sadly there is no way to import or export  them from here:

For that we have to you use another way:

We will have to connect to a shell and use the emdef utility.

Below a quick description of possible actions on Calendar definitions using emdef utility (you can use the help option of Control M Gui to get more infos)

Here we are:

The purpose is to perform an export of an existing calendar in order to backup it or to use it on another Control M server (it must be imported for that and this will be explained further)

  • Exporting a calendar :

In the following example we want to export a calendar named CALENDAR_EXAMPLE_1 from my controlm server named ctmservercentos in a file named /tmp/CALENDAR_EXAMPLE_1_exported.xml.By the way the datacenter name is also named ctmservercentos

You have to open a command prompt and proceed like below :

Tip on Unix : You can have the datacenter name by typing this command:

[emuser@CTMSRVCENTOS tmp]$ ps -eaf | grep gtw
emuser 21889 1 0 févr.09 ? 00:00:00 /bin/sh /home/emuser/ctm_em/bin/em gtw -dc CTMSRVCENTOS
emuser 21944 21889 0 févr.09 ? 00:00:27 emgtw -dc CTMSRVCENTOS
emuser 28556 26207 0 13:51 pts/4 00:00:00 grep --color=auto gtw
[emuser@CTMSRVCENTOS tmp]$

First step : Create a .xml file containing the informations below :

cat /tmp/caltmp.xml
<TERMS>
<TERM>
<PARAM NAME="DATACENTER" OP="EQ" VALUE="DATACENTER_NAME"/>
<PARAM NAME="CALENDAR" OP="EQ" VALUE=" CALENDAR_EXAMPLE_1 "/>
</TERM>
</TERMS>

Second step :Use the « emdef » utility to get the file exported:

emdef exportdefcal -u controlm_username -p " controlm_username_passwd" -s ctmserver_name -arg "/tmp/caltmp.xml" -out "/tmp/CALENDAR_NAME_exported.xml"

Example:

[emuser@CTMSRVCENTOS tmp]$ emdef exportdefcal -u emuser -p "N********DBI" -s CTMSRVCENTOS  -arg "/tmp/caltmp.xml" -out "/tmp/CALENDAR_EXAMPLE_1_exported.xml"
1 calendars were exported.
Writing calendars to file /tmp/CALENDAR_EXAMPLE_1_exported.xml

As result you have a new file generated named CALENDAR_EXAMPLE_1_exported.xml you can verify if it is corresponding to the file calendar we wanted to export:

[emuser@CTMSRVCENTOS tmp]$ cat /tmp/CALENDAR_EXAMPLE_1_exported.xml
<?xml version='1.0' encoding='ISO-8859-1' ?>
<!DOCTYPE DEFCAL SYSTEM "defcal.dtd">
<DEFCAL >
 <CALENDAR DATACENTER="CTMSRVCENTOS" NAME="CALENDAR_EXAMPLE_1" TYPE="Regular">
  <YEAR DAYS="NNNNNNNNNYNNNNNNNNNNNNNNNNNNNNNNNNYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNYYYNNNNNNNNNNNNNNNNNYYNNNNNYYNNNNNYYNNNNNNNNNNNNNNNNNNNNNNNNNNNYNNNNNNYNNNNNNNNNNNNNYNNNNNNNNNYNNNNNNNNNNNNNNNNNNNYNNNNNNNNNNNNNNNNNNNNNNNNNNYNNNNNNYNNNNNNYNNNNNNNNNNNNNNNNNNNNNNYNNNNNNNNNNNYNNNNNNNNNNNNNNYNNNNNYNNNNNNYNNNNNNNNNNNNNNNNNNNYYYNNNNYYYNNNNNNNNNNNNNNNNNNNYYYNNNNYYYNNNNYYYNNNNN" DESCRIPTION="example of regular calendar to be exported to another control M server" NAME="2020"/>
 </CALENDAR>
</DEFCAL>
[emuser@CTMSRVCENTOS tmp]$
  • Importing calendar to another control M server

Now we have exported calendar we can use it on another Control M server to schedule jobs.

Below we will describe the task to do and the prerequisites to check:

  • Delete or rename the calendar if it already exists
  • Copy in the /tmp file the calendar exported in the previous part explianing the calendar export (CALENDAR_EXAMPLE_1_exported.xml)
  • Check  if  DATACENTER is correct in the xml
<!DOCTYPE DEFCAL SYSTEM "defcal.dtd">
<DEFCAL >
 <CALENDAR DATACENTER="CTMSRVCENTOS" NAME="CALENDAR_EXAMPLE_1" TYPE="Regular">

After these checks we can connect to the other Control M Server we want to import this calendar threw a linux shell and use again the emdef utility:

emdef defcal -u controlm_username -p [password] -HOST [control M_server] -SRC_FILE "./[filename].xml"

Get the xml file we have transfered:

[root@serveurctmv919 ~]# su - emuser
serveurctmv919% cd /tmp

We can launch the command to import the calendar and then checkinh on the gui if it is appearing:

serveurctmv919% emdef defcal -u emuser -p Sofiane2020 -HOST serveurctmv919 -SRC_FILE "/tmp/CALENDAR_EXAMPLE_1_exported.xml"
File /tmp/CALENDAR_EXAMPLE_1_exported.xml is missing.
serveurctmv919% cat /tmp/CALENDAR_EXAMPLE_1_exported.xml
cat: /tmp/CALENDAR_EXAMPLE_1_exported.xml: acces denied

we have to give the correct rights to grant emuser acces to this file.

You must connect with user root and grant access:

[root@serveurctmv919 ~]# chown emuser:emgrp /tmp/CALENDAR_EXAMPLE_1_exported.xml
[root@serveurctmv919 ~]#
serveurctmv919% ls -lart C*
-rwxrwx---. 1 emuser emgrp 676 10 févr. 12:58 CALENDAR_EXAMPLE_1_exported.xml

And then restart the command:
serveurctmv919% emdef defcal -u emuser -p S*****0 -HOST serveurctmv919 -SRC_FILE "/tmp/CALENDAR_EXAMPLE_1_exported.xml"
serveurctmv919% cd /tmp
serveurctmv919% emdef defcal -u emuser -p S*****0 -HOST serveurctmv919 -SRC_FILE "/tmp/CALENDAR_EXAMPLE_1_exported.xml"
2020: Description's length must not exceed 50 characters.
Error parsing XML input file. Exiting...
serveurctmv919%

We get an issue asking us to check about the description length (that was obviously not the case)

The solution to solve this issue is to check again the prerequisites especially the one concerning the datacenter name:

Into the exported file we had:

<!DOCTYPE DEFCAL SYSTEM “defcal.dtd”>
<DEFCAL >
<CALENDAR DATACENTER=”CTMSRVCENTOS” NAME=”CALENDAR_EXAMPLE_1″ TYPE=”Regular”>

Whereas our Control M server datacenter is named :

serveurctmv919% ps -eaf | grep gtw
emuser    4907     1  0 14:47 pts/0    00:00:00 /bin/sh /home/emuser/ctm_em/bin/em gtw -dc serveurctmv919

We have to update the xml file with the right datacenter and restart the command:


serveurctmv919% emdef defcal -u emuser -p S*****0 -HOST serveurctmv919 -SRC_FILE "/tmp/CALENDAR_EXAMPLE_1_exported.xml"
Year 2020 is valid
Calendar CALENDAR_EXAMPLE_1 of DataCenter serveurctmv919 is valid
1 calendars were written to CONTROL-M/EM database.
serveurctmv919%

Now we will connect to the GUI of the control M server where we done the calendar import to check if the calendar is available.


Conclusion :

Export of the calendar and import are done successfully! This can be useful when you work on many environments that must be ISO with the same schedule but on different clients.

Cet article Control-M : Import/Export Calendar est apparu en premier sur Blog dbi services.

Control-M/EM – emdef utility – Calendars

$
0
0

The emdef is a command line utility used to make various modifications to calendar definitions in the Control-M/EM database. It can be used to extract data as a backup, or for migration from one to another environment.

The emdef need some parameters which vary according to the method to be executed, methods related calendars are:

  • copydefcal: Creates a new calendar definition identical to an existing one
  • exportdefcal: Exports calendar definitions to a file for use as input to other utilities
  • defcal: Imports calendar definitions from a file

At a customer this tool is widely used to automate some daily actions, e.g. export calendars and integrate them in other environment.

Basic parameters definition

To run emdef there are some fixed parameters whatever the method used, the syntax should be like:

emdef <METHOD_TO_USE> [-USERNAME <userName> [-PASSWORD <password>] | -PASSWORD_FILE <passwordFile>] -HOST <guiServerName> ...
  • userName: The Control-M/EM user name
  • password: The control-M/EM user password
  • passwordFile: Here you have the choice to put the user and password in the command line, or use the passwordFile which is a file containing an unencrypted user name and password on separate lines in the below format:
    user=userName
    password=password

    I personally recommend this practice to avoid adding user/password in the command line, on the other hand you have to correctly set file permissions!

  • guiServerName: Control-M/EM GUI server host name or IP address

Let’s move to methods to better understand this utility.

Methods definition

copydefcal

The copydefcal method creates new calendar(s) definition in the Control-M/EM database identical to existing calendar(s) definition. Calendar(s) can be copied and saved under different names in the same data center, or can be copied to a different data center and saved under the same or different names.
As you noticed one or multiple calendars can be handled in the same time using the * wildcard character.

To run the copydefcal utility, you need to specify basic parameters shown before plus the below one:

-arg <argFileName>

=> Path and name of the arguments file containing the copydefcal specifications

Example: Copy all calendars from one datacenter to another

  1. Create an argument file copycal.xml like the following:
    <COPYCAL>
      <DATACENTER FROM="DC_A" TO="DC_B"/>
      <CALENDAR FROM="*" TO="*"/>
    </COPYCAL>
    
  2. Create password file credentials.txt:
    user=emuser
    password=MyEmUserPass
    
  3. Execute the command:
    emdef copydefcal -pf <passwordFile> -s <guiServerName> -arg <argFileName>

    Which correspond to:

    emdef copydefcal -pf credentials.txt -s guiserverhostname -arg copycal.xml

exportdefcal

The exportdefcal utility exports calendar definitions in the Control-M/EM database to an output file for use as input to other utilities.
In fact, when the exportdefcal utility is invoked, an arguments file that you prepare is processed. This arguments file contains statements that specify an existing calendar or group of calendars. The calendars that you specified in the arguments file are exported to an output file. You can modify the exported calendars in the output file and re-import them into the Control-M/EM database using the defcal utility
To run the exportdefcal utility, you need to specify basic parameters shown before plus the below:

-arg <argFileName> -out <outFileName>

=> The argument file prepared is processed when the exportdefcal is invoked, this file should contains statements that specify an existing calendar or group of calendars.
=> Calendar(s) will be exported in the output file you specified as a parameter, this file definitions can be imported into a Control-M/EM using defcal utiliy for example.

Example: Export all calendars except one, from a datacenter

  1. Create an argument file exportcal.xml like the following:
    <TERMS>
    	<TERM>
    		<PARAM NAME="DATACENTER" OP="EQ" VALUE="DC_0"/>
    		<PARAM NAME="CALENDAR" OP="NEQ" VALUE="ALL_DAYS_CAL"/>
    		<PARAM NAME="TYPE" OP="EQ" VALUE="Periodic"/>
    	</TERM>
    </TERMS>
    

    Here I export all periodic calendars in the data center DC_0 and not equal to ALL_DAYS_CAL. See below the syntax to be respected.

  2. Use the same credentials.txt created above.
  3. Execute the command:
    emdef exportdeffolder -pf <passwordfile> -s <guiservername> -arg <argfilename> -out <outfilename>

    Which correspond to:

    emdef exportdeffolder -pf credentials.txt -s guiserverhostname -arg exportcal.xml -out DC_0_CAL_Def.xml

It is important to mention here the syntax to create the argument file:

  • TERMS
    These tags indicate the start and end of the TERMS file. Only criteria that are located between the tags are considered to be part of the argument!
  • TERM
    The TERM tags indicate the start and the end of a group of selection criteria used to specify a calendar or calendars that are to be exported.
  • Relationship between terms
    You can use AND and OR to join two terms.
    Parameter name
    Parmeters name are fixed:
    – DATACENTER
    – CALENDAR
    – TYPE
  • Relationship between the NAME and the VALUE
    Valid values:
    – EQ
    – NEQ
    – NOTIN
    – LIKE
  • Value
    Of course, it should correspond to the parameter:
    -For DATACENTER: name of the Control-M installation
    -For CALENDAR: calendar name
    -For TYPE: Relative, Regular, Periodic, Rule_Based
  • defcal

    The defcal utility imports a calendar definition into the Control-M/EM database, it reads calendar definitions directly from a plain text input file in XML format, maybe generated in the step before? 🙂
    To run the exportdefcal utility, you need to specify basic parameters shown before plus the below one:

-src <srcFileName>

=> The source file is a calendar(s) definition in a xml file

Example: Export all calendars except one, from a datacenter

  1. Use the same credentials.txt created above.
  2. Execute the command:
    emdef defcal -pf <passwordFile> -s <guiServerName> -src <srcFileName>

    Which correspond to:

    emdef defcal -pf credentials.txt -s guiserverhostname -src DC_0_CAL_Def.xml

As you saw this utility may be so useful to automate your work and standardize your calendars across all environments. My colleague Nabil talked about calendars management in this blog. In the next blog I will talk about emdef utility with folders. And you? Did you already use it? Don’t hesitate to share your experience and ask questions 😉

Cet article Control-M/EM – emdef utility – Calendars est apparu en premier sur Blog dbi services.


Control-M/EM – emdef utility – Folders

$
0
0

In a previous blog I talked about the emdef utility and how it can help to manage and standardize calendars across multiple environments. In this blog we will deal with folders definitions in the Control-M/EM database. This can be used to extract data as a backup, or for migration from one to another environment.

The emdef need some parameters which vary according to the method to be executed, methods related to folders are:

  • exportdeffolder: Exports folders and SMART Folders
  • deffolder: Imports folders and SMART Folders
  • updatedef: Updates specified parameter

Basic parameters definition

Basic parameters not change between calendars and folders ethod, so to run emdef there are some fixed parameters whatever the method used, the syntax should be like:

emdef <METHOD_TO_USE> [-USERNAME <userName> [-PASSWORD <password>] | -PASSWORD_FILE <passwordFile>] -HOST <guiServerName> ...
  • userName: The Control-M/EM user name
  • password: The control-M/EM user password
  • passwordFile: Here you have the choice to put the user and password in the command line, or use the passwordFile which is a file containing an unencrypted user name and password on separate lines in the below format:
    user=userName
    password=password

    I personally recommend this practice to avoid adding user/password in the command line, on the other hand you have to correctly set file permissions!

  • guiServerName: Control-M/EM GUI server host name or IP address

Let’s move to methods to better understand this utility.

Methods definition

exportdeffolder

The exportdeffolder utility exports folders from the Control-M/EM database to a file. To do so, a file of arguments should be prepared and contains statements that specify an existing folder, SMART Folders and Sub-folders. The specified folders are exported to an output file. Then, output files created with the exportdeffolder utility can be used as import files with the deffolder utility.

For example, on DEV you can define, test, and validate jobs configuration, then export job definitions to an output file using exportdeffolder, make modifications (needed on TEST Env) to the definitions, and use the file modified as the input file when running deffolder on Test environment. In this way, you can be sure that jobs are identical between DEV and TEST environments.

To run the exportdeffolder utility, you need to specify basic parameters shown before plus the below one:

-arg <argFileName> -out <outputFileName>

=> Path and name of the arguments file containing the exportdeffolder specifications, and the output file you want.
Example: Export all folders configuration with name match DEV*

  1. Create an argument file “exportALLdevfolder.xml” like the following:
    <TERMS>
      <TERM>
        <PARAM NAME="FOLDER_NAME" OP="LIKE" VALUE="DEV*"/>
      </TERM>
    </TERMS>
    
  2. Create password file credentials.txt:
    user=emuser
    password=MyEmUserPass
    
  3. Execute the command:
    emdef exportdeffolder -pf <passwordFile> -s <guiServerName> -arg <argFileName> -out <outputFileName>

    Which correspond to:

    emdef exportdeffolder -pf credentials.txt -s guiserverhostname -arg exportALLdevfolder.xml -out AllDevFolders.xml 

deffolder

The deffolder utility imports folders and SMART Folders into the Control-M/EM database. The input file is usually a file generated from exportdeffolder or from the gui, anyway it should contains statements that specify an existing folders or new folders (valid for SMART Folders).
So, if the folders don’t exist, the utility creates them. If the folders already exist, a message is issued indicating that the folders already exist, there is a way to overwrite the existing folders.

To run the deffolder utility, you need to specify basic parameters shown before plus the below one:

-src <srcFileName> [/a] [/o] [/t] [/v]

=> -src: to specify the imput xml file that contains the folders definitions
=> /a: directs the utility to automatically reset the “Created By” parameter to the current Control-M/EM user when these two values do not match – OPTIONAL
=> /o: directs the utility to overwrite any existing folders – OPTIONAL
=> /t: Operate on a single folder at a time, to reduce process memory – OPTIONAL
=> /v: Used to receive verbose messages – OPTIONAL

Example: Import folders configuration that has been exported in the previous method

  1. Use the same file credentials.txt created before
  2. Execute the command:
    emdef deffolder -pf <passwordFile> -s <guiServerName> -src <srcFileName>  [/a] [/o] [/t] [/v]

    Which correspond to:

    emdef deffolder -pf credentials.txt -s guiserverhostname -src AllDevFolders.xml /o

updatedef

The updatedef utility updates specified parameter values in the in the Control-M/EM database, e.g.:
– Folder definitions
– SMART Folder definitions
– Sub-folder definitions

To run the updatedef utility, you need to specify basic parameters shown before plus the below one:

-arg <argFileName> [/a]

=> -arg : arguments file containing updatedef specifications. The syntax of this xml file is not complex but contain a lot of cases so don’t hesitate to ask if you have any question!
=> [/a]: directs the utility to automatically reset the “Created By” parameter to the current Control-M/EM user when these two values do not match – OPTIONAL

Example: Modify the Job name of a jobs in APPLICATION_1 and datacenter DC_0:

  1. Create an argument file “UpdateJobs.xml” like the following:
    <UPDATE>
    	<JOB>
    		<FOLDER_NAME FROM="APPLICATION_1"/>
    			<DATACENTER FROM="DC_0"/>
    			<JOBNAME FROM="src*" TO="dbi*"/>
    ...
  2. Use the same file credentials.txt created before
  3. Execute the command:
    emdef updatedef -pf <passwordFile> -s <guiServerName> -arg <argFileName> [/a]

    Which correspond to:

    emdef updatedef -pf credentials.txt -s guiserverhostname -arg UpdateJobs.xml

Now you are able to manage folders and calendars using emdef utility, this tool will help you to easily and automatically export, migrate, make bulk update on Control-M definitions.

Cet article Control-M/EM – emdef utility – Folders est apparu en premier sur Blog dbi services.

Control-M/EM – Update ctmuser & emuser DBO password

$
0
0

Introduction:

For security purposes we sometimes must update the password of our Control M architecture.

This task seems to be easy, but you must be careful to update the passwords wherever they are used.

Today we will check how to update the password of our Control M infrastructure including application user and database owner for each of our databases.

 

Prerequisites:

Be sure to have the old password to proceed easier.( not mandatory but easier for us to perform the update)

Be careful of credentials stored in other applications file or called in argument in a script it would generate side effect.

Control M version compatibility:

Every version of Control M*(be careful of file encryption location)

There is some steps to follow in order to update the passwords everywhere, it can be:

  • used from the Control M GUI/CCM
  • used from Controlm utility
  • used from the dedicated database of each users ctmuser and emuser.

The following method will show you how to update the password on an oracle and a postgres dabatase.

You can ask some help form a database administrator if you get some issues during this update.

The control M version we will use today is 9.0.00:

Don’t hesitate connect to BMC site to check documentation and case concerning password update on Control M.

https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=kA014000000dotuCAA&type=FAQ

Update emuser password

1)From Control-M Configuration Manager (CCM) 9.0.00 and higher from CTM GUI, go to Security menu -> Authorizations and choose the user to change the password:

 

Once done you can connect to a shell and do the EM DBO Update:

2)Stop all EM processes except EM database processes. (via root_menu)

 

serveurctmv919% root_menu
Enter a CONTROL-M/EM DBO Username : emuser
 
Enter the Password :

enter the old password (the one that will be updated further)

 

CONTROLM/EM Root Menu
-----------------------
 
Select one of the following options:
 
1 - Activation Menu
2 - Troubleshooting Menu
3 - New Window
4 - Clean Database Schema
5 - Database Maintenance Menu
6 - Users Administration Menu
 
q - Quit
 
Enter option number --->
 
Enter option number --->  1
Activation Menu
---------------
Select one of the following options:
 
1  - Check All
 
2  - Start All
3  - Start Database Server
4  - Start CORBA Naming Service
5  - Start CONTROL-M/EM Configuration Agent
6  - Start CONTROL-M Configuration Server
 
7  - Stop All
8  - Stop Database Server
9  - Stop CORBA Naming Service
10  - Stop CONTROL-M/EM Configuration Agent
11  - Stop CONTROL-M Configuration Server
 
p  - Previous Menu
q  - Quit
 
Enter option number

—> select the 3 choices 9/10/11 to stop these services

 

Indeed, the database must be up to make the emuser password update.

 

Stopping the CONTROL-M Configuration Server will prevent you from centrally managing CONTROL-M components!
Are you sure you want to stop the CONTROL-M Configuration Server ? [y/n] :
 
Shutting down the CONTROL-M/EM Configuration Agent will cause shutdown of all local CONTROLM/EM server components!
Are you sure you want to continue ?? [y/n] : y
***********

 

Shutdown command has been issued to Config_Agent on serveurctmv919
 
***************************************
Naming service status: Stopped, serveurctmv919:13075
 
Shutting down the server ...
PostgreSQL server stopped
 
Press Enter to continue

Once the emserver is stopped we can connect to the database and update the password

(for this example my used database is a postgres version 9.2.8 one)

 

3)Actions regarding your database for:

Oracle:

use the sqlplus command and log in as the SYSTEM account

PostgreSQL:

Connect to posgres with the admin/postgres user:

Example for UNIX ==> em psql -U postgres -P <password>

 

4) Executes following SQL statements to change the EM DBO password for:

Oracle:

SQL> alter user <EM_DBO> identified by <new_pass>;

example ====>

 alter user emuser identified by   NabilDBI12345 ;

Note:If you don’t have acces to dba admin account to update the password you can use the interactive sql command in root_menu option

You can use this command

alter user emuser identified by <NewPWD> replace <OldPWD>;

where NewPWD is the new password you want to give to emuser.

(Many thanks to my colleague Nabil LAABID for sharing this ,because dba admin access can be restricted depending your access level in your work environment )

In any case if you are stuck  you can contact your dba Administrator to make the update.

PostgreSQL:

1>alter user <EM_DBO> with password ‘<new_pass>’;

example ====>

em900=#alter user emuser with password 'NabilDBI12345' ;

 

Example for my configuration :

serveurctmv919% uname -a
Linux serveurctmv919 4.18.0-147.3.1.el8_1.x86_64 #1 SMP Fri Jan 3 23:55:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
serveurctmv919% psql --version
psql (PostgreSQL) 9.2.8
serveurctmv919%
serveurctmv919% em psql -U postgres
Password for user postgres:
psql (9.2.8)
Type "help" for help.
em900=#
em900=# alter user emuser with password 'NabilDBI12345' ;
ALTER ROLE
em900=#
 

Note: No need to commit the action.

5) Login with EM DBO to sql prompt with the new EM DBO password to verify the change has been made.

[root@serveurctmv919 ~]# su - emuser
serveurctmv919% em psql -U emuser
Password for user emuser:      ==>(with the old password)
psql: FATAL:  password authentication failed for user "emuser"
serveurctmv919% em psql -U emuser
Password for user emuser:      ==>(with updated password)
psql (9.2.8)
Type "help" for help.
em900=>

 

Now EM DBO password is upated in the database.

If you get any issue with the db password update please check it with your database administrator, you can also let him do the db password update and get the password from him after that.

6)After having updated the EM DBO password you have to encrypt it in a file used by emserver to check and run the services:

It is the mcs.ini file.

Update new EM DBO password into ‘mcs.ini‘ on all EM installations that have a database client. This file existed on EM server(s) and EM client installation with Reporting Facility.

*make a backup of mcs.ini file you can also make a “diff” between the old and the updated mcs.ini file encrypted to see if the change was effective.

% ecs cryptocli <username> <newpassword> $HOME/ini/mcs.ini

Example:

ecs cryptocli emuser NabilDBI12345 /home/emuser/ctm_em/ini/mcs.ini

7) Restart the Control-M/Enterprise Manager components –

you can start it from root_menu utility or start_all from emuser home directory

Restart CCM GUI to use the new password.

 

Update ctmuser password

 

If you know the current password for the Control-M/Server DBO user, then you can use the ‘ctmpasswd‘ utility to update the password in the database and the encrypted password that is saved.

Example:

serveurctmv919% ctmpasswd

Please enter the old password for ctmuser account:

Please enter the new password for ctmuser account:

Enter again to verify:

Password was successfully changed.

serveurctmv919% psql -U ctmuser

Password for user ctmuser:

psql (9.2.8)

Type “help” for help.

If you do NOT know the current password, then you can use the following steps:

1) stop the Control-M/Server configuration agent and the Control-M/Server.

You can use the ctm_menu utility or shut the configuration agent and ctm from the commands shut_ctm and shut_ca ( used by ctm_menu by the way)

serveurctmv919% shut_ctm
------------------------
Shutting down CONTROL-M.
------------------------
Waiting ...
serveurctmv919% shut_ca
---------------------------------------------------
Shutting down CONTROL-M/Server Configuration Agent
---------------------------------------------------
Waiting ...
CONTROL-M/Server Configuration Agent is down

2) log into the database as the database Administrator (DBA) account:

-Oracle:

Use the sqlplus / as sysdba command and log in as the SYSTEM account

-PostgreSQL:

Use the command: psql

Example :

serveurctmv919% psql -U postgres

3) Next, run the following SQL commands:

-Oracle:

SQL> alter user {USERNAME.EN_US} identified by {new_pass};

where:

{USERNAME.EN_US} is the userid of the DBO account that you want to change its password (default: ctmuser)

{new_pass} is the new password value.

For example:

SQL> alter user ctmuser identified by newpassword;

-PostgresSQL:

1>ALTER USER {CTMUSER.EN_US} WITH PASSWORD ‘{new_pass}’;

ALTER USER controlm WITH PASSWORD ‘NabilDBI12345’;

Oracle:

===> sqlplus /as sysdba

SQL> alter user ctmuser identified by <new_pass>;

PostgreSQL:

1>alter user ctmuser with password ‘<new_pass>’;

alter user ctmuser with password 'NabilDBI12345';

Example:

ctrlm900=# alter user ctmuser with password 'NabilDBI12345';
ALTER ROLE
ctrlm900-# \q
serveurctmv919% psql -U ctmuser
Password for user ctmuser:
psql (9.2.8)
Type "help" for help.
 
ctrlm900=> \q
serveurctmv919%

Note no commit needed

4) Exit the sql prompt (type “exit”), and try to log back into SQL with the DBO account and its new password, to check if the change has worked.

serveurctmv919% psql -U ctmuser
Password for user ctmuser:
psql (9.2.8)
Type "help" for help.
 
ctrlm900=>

Seems that the password update worked!

5) For Control M DBO there is also an encryption of the password that is stored in different files regarding the Control M version we have:

Change the saved encrypted password that the Control-M processes use to connect to the database:

  • For 9.0.18 and lower : This is saved in the file .controlm located here :/home/controlm/.controlm
  • For 9.0.19 and higher : It is saved .ctmpdo/.ctmpda located here : /home/controlm/ctm_server/data/keys/

The ‘ctminterface’ command saves the encrypted password to the above files file for the Control-M Server utilities to use.
Executing ctminterface does NOT change the password in the database. It only updates the encrypted password in the .controlm or .ctmpdo/.ctmpda files.

Run teh command below:

ctminterface -SET_PASSWORD PRIMARY <DBO_Password>

example :

ctminterface -SET_PASSWORD PRIMARY NabilDBI12345

we can check if the encryption was effective by consulting the file before our command.

As I am on control v9.0.00 the file concerned is the hidden file named.controlm:

[root@serveurctmv919 ~]# su - controlm
serveurctmv919% cat /home/controlm/.controlm
serveurctmv919% 50-85-1050-350-14-1110-280-890-8900960-500118008501170-8300-3

Then we can do the command:

serveurctmv919% ctminterface -SET_PASSWORD PRIMARY NabilDBI12345

As result we can see that the encryption chain was updated:

serveurctmv919% cat /home/controlm/.controlm
serveurctmv919% 700220-250127-123-1020-9500-700-200270-59007000640115-1210078

Note: There will not be any confirmation that what you typed was correct or incorrect or that it was accepted.

Try to start the CONTROL-M/Server and if it doesn’t start, the password entered was likely incorrect.

6) Restart the Control-M/Server

serveurctmv919% start_ctm
Starting CONTROL-M/Server at mar. fvr. 11 09:30:28 CET 2020
CONTROL-M/Server Pause mode is N
serveurctmv919% start_ca
 
Starting CONTROL-M/Server Configuration Agent at mar. fvr. 11 09:30:36 CET 2020
serveurctmv919%

Conclusion:

You have now updated your Control M credentials concerning the applicative users and the Control M /EM database owner.

Remember that you must be careful about updating theses passwords if used by other scripts or called in a variable file that you must update also!

Troubleshooting:

During this update and sometime in other cases you may have some weird message concerning mcs.ini file when restarting Control M/EM services:

Sometimes you can have this message when you perform the “root_menu” utility:

The message displays:

wrong number of parms in mcs.ini 1

To get rid of this issue you must edit the mcs.ini file:

Indeed, there is one or more hidden line that is generating a wrong message when perform the EM services check or launch.

You have to delete the empty line at the bottom of the file located following this path:

/home/emuser/ctm_em/ini/mcs.ini

Then, when you do the “check_all” utility again,you will see that the issue isn’t here anymore.

Cet article Control-M/EM – Update ctmuser & emuser DBO password est apparu en premier sur Blog dbi services.

How can Docker help a MariaDB cluster for Disaster/Recovery

$
0
0

Mistakes or accidental data deletions can sometimes happen on a productive MariaDB Galera Cluster and this can be disastrous.
There are so many cases I have heard by customers and hereafter are some of the most common:
– dropping one column of a table
– dropping one table
– updating a big table without a where clause
What if it was possible to restore online a subset of data without downtime?

Restoring & recovering using either mysqldump or mariabackup is not satisfying when you have just to recover a subset of data.
In both case, there will be a downtime and it will be a very long process.
– with mysqldump: you will have first to stop the application, restore the latest Full backup and then apply all the binary logs until the guilty command.
– with mariabackup, you will also have to stop the application but additionally the cluster, then restore the latest Full backup on one node, restart the cluster (galera_new_cluster), apply all the binary logs and finally restart the other members in order to be synchronized.
Having a test server where you can restore/recover a logical backup can help, you just have then to export the needed data, copy them on the productive host and reload them but still, it will take a lot of time.

MariaDB Galera Cluster preparation

“On a souvent besoin d’un plus petit que soi” was saying Jean de La Fontaine in his fable: Le lion et le rat
Let’s use this proverb to implement this solution, we will need some help from Docker.
Using a fourth node (the helper), we will deploy 3 containers with a delayed replication of 15minutes, 1hour and 6hours but of course you can choose your own lags.
Here is an overview of my environment.
MariaDB Galera Cluster
As it is mandatory for the replication (Master/Slave), the first step is to activate the binary logs (binlogs) in the option file (my.cnf) on every master node.
Why?
Because they are not activated by default on a Galera Cluster, writesets (transactions) are written to a Galera cache (Gcache) and in case of recovery/resynchronisation, Galera will perform an Incremental State Transfer by using the Gcache.

[mysqld]
# REPLICATION SPECIFIC
server_id=3
binlog_format=ROW
log_bin = /var/lib/mysql/binlog
log_slave_updates = ON
expire_logs_days = 7

$ sudo systemctl restart mariadb

The second step is to create a dedicated user with the “REPLICATION SLAVE” privilege for the replication.
We only need to create it once as it is automically duplicated on the other members

MariaDB > create user rpl_user@’192.168.56.%' IDENTIFIED BY 'manager';
MariaDB > GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%’;
MariaDB > show grants for 'rpl_user'@'’192.168.56.%';
+-------------------------------------------------------------------------------------+
| Grants for rpl_user@%                                                               |
+-------------------------------------------------------------------------------------+
| GRANT REPLICATION SLAVE ON *.* TO 'rpl_user'@'’192.168.56.%' IDENTIFIED             |
+-------------------------------------------------------------------------------------+

The third step is to create a Full backup of one of the master node and transfer it on the docker host

$  mysqldump -uroot -p \
--flush-privileges --hex-blob --single-transaction \
--triggers --routines --events \
--all-databases | gzip -6 -c > mysqldump_complete.sql.gz
$  scp mysqldump_complete.sql.gz root@192.168.56.205:/tmp

Docker Host preparation

Now on the Docker host, we create 3 directories for each container
mariadb.conf.d: store MariaDB configuration file, mapped into the container under /etc/mysql/mariadb.conf.d
datadir : store the MariaDB data, mapped to /var/lib/mysql
sql : backup file (dump) & script files to automate replication configuration and startup

$ for val in 15m 1h 6h
$ do
$  sudo mkdir -p /storage/mariadb-slave-${val}/mariadb.conf.d
$  sudo mkdir -p /storage/mariadb-slave-${val}/datadir
$  sudo mkdir -p /storage/mariadb-slave-${val}/sql
$ done

We prepare the MariaDB configuration for every slave & insert the following parameters

$ vi /storage/mariadb-slave-15m/mariadb.conf.d/my.cnf
[mysqld]
server_id=10015
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

For the next step, we need to create two scripts for every delayed slave. The first one will only contain the following statement
RESET MASTER;
This will delete all old binary log files & start a new binary log file sequence with a suffix of “.000001”
The second one will setup & start the replication. The most important parameter is “MASTER_DELAY” as it determines the amount of time in seconds the slave should lag behind the master.
CHANGE MASTER TO MASTER_HOST = '192.168.56.203’, \
MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'manager’, \
MASTER_DELAY=900;
START SLAVE;

Last we copy the backup (dump) on every slave “sql” directory and renamed it 2_mysqldump.sql.gz

$ cp ~/mysqldump_complete.tar.gz /storage/mariadb-slave-15m/sql/2_mysqldump.tar.gz
$ cp ~/mysqldump_complete.tar.gz /storage/mariadb-slave-1h/sql/2_mysqldump.tar.gz
$ cp ~/mysqldump_complete.tar.gz /storage/mariadb-slave-6h/sql/2_mysqldump.tar.gz

The final look of every “sql slave directory has to be as following

$ ll /storage/mariadb-slave-{15m,1h,6h}/sql
-rw-r--r-- 1 mysql mysql     14 Feb 13 14:35 1_Reset_master.sql
-rw-r--r-- 1 mysql mysql 843584 Feb 13 14:44 2_mysqldump.tar.gz
-rw-r--r-- 1 mysql mysql    134 Feb 13 14:35 3_setup_slave.sql

-rw-r--r-- 1 mysql mysql 14 Feb 13 14:35 1_Reset_master.sql
-rw-r--r-- 1 mysql mysql 843584 Feb 13 14:44 2_mysqldump.tar.gz
-rw-r--r-- 1 mysql mysql 134 Feb 13 14:35 3_setup_slave.sql

-rw-r--r-- 1 mysql mysql 14 Feb 13 14:35 1_Reset_master.sql
-rw-r--r-- 1 mysql mysql 843584 Feb 13 14:44 2_mysqldump.tar.gz
-rw-r--r-- 1 mysql mysql 134 Feb 13 14:35 3_setup_slave.sql

We prefix the SQL scripts with an integer because it determines the execution order when Docker initializes the MariaDB container.

Containers deployment

Everything is in place, let’s start and run the 3 MariaDB containers. One last thing, MYSQL_ROOT_PASSWORD must be the same as the MariaDB root password on the master.

$ for val in 15m 1h 6h
$ do
$  docker run –d \
--name mariadb-slave-$val \
--hostname mariadb$val \
-e MYSQL_ROOT_PASSWORD=manager \
-v /storage/mariadb-slave-$val/datadir:/var/lib/mysql \
-v /storage/mariadb-slave-$val/mariadb.conf.d:/etc/mysql/mariadb.conf.d \
-v /storage/mariadb-slave-$val/sql:/docker-entrypoint-initdb.d  \
mariadb

We first check if all containers are started.

$ docker ps –a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
f0f3d674c2f5        mariadb             "docker-entrypoint.s…"   3 minutes ago       Up 3 minutes        3306/tcp            mariadb-slave-15m
393145021a84        mariadb             "docker-entrypoint.s…"   3 minutes ago       Up 3 minutes        3306/tcp            mariadb-slave-1h
9d3bc9bd214c        mariadb             "docker-entrypoint.s…"   3 minutes ago       Up 3 minutes        3306/tcp            mariadb-slave-6h

We then check the log file to see it the slave is connected.

$ docker logs -f mariadb-slave-15m
...
2019-03-11 10:01:08 0 [Note] mysqld: ready for connections.
Slave I/O thread: connected to master 'rpl_user@192.168.56.203:3306',replication started in log 'FIRST' at position 4

Finally we check the replication status and the delay.

$ docker exec -it mariadb-slave-15m mysql -uroot -p -e 'show slave status\G’
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.56.203
...
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

SQL_Delay: 900
...
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Conclusion

So, set up is now over, we will see in another blog post with a typical case how we can restore efficiently and online from such a situation.

Cet article How can Docker help a MariaDB cluster for Disaster/Recovery est apparu en premier sur Blog dbi services.

Control-M/EM – emdef utility – Jobs

$
0
0

The emdef is a command line utility used to make various modifications to job definitions in the Control-M/EM database. we saw how emdef can help to manage Calendars and Folders, today it the Jobs turn so let’s deep in 😉

The emdef need some parameters which vary according to the method to be executed, methods related to Jobs management are:

  • exportdefjob: Exports job processing definitions to an output file
  • defjob: Imports job processing definitions
  • copydefjob: Creates a new job definition similar to a specified existing definition
  • duplicatedefjob: Makes a copy of an existing job definition in the same data center
  • deldefjob: Deletes specified job processing definitions

Basic parameters definition

Basic parameters not change between calendars and folders methods, so to run emdef there are some fixed parameters whatever the method used, the syntax should be like:

emdef <METHOD_TO_USE> [-USERNAME <userName> [-PASSWORD <password>] | -PASSWORD_FILE <passwordFile>] -HOST <guiServerName> ...
  • userName: The Control-M/EM user name
  • password: The control-M/EM user password
  • passwordFile: Here you have the choice to put the user and password in the command line, or use the passwordFile which is a file containing an unencrypted user name and password on separate lines in the below format:
    user=userName
    password=password

    I personally recommend this practice to avoid adding user/password in the command line, on the other hand you have to correctly set file permissions!

  • guiServerName: Control-M/EM GUI server host name or IP address

Let’s move to methods to better understand this utility.

Methods definition

exportdefjob

The exportdefjob utility exports job processing definitions from a folder in the Control-M/EM database to an output file. It processes a specified file of arguments in XML format which contains statements that identify existing job processing definitions. The identified definitions are exported from the Control-M/EM database to an output file.
Then the output file can be used as an input file in the same datacenter or in another one, depending on your needs.

To run the exportdefjob utility, you need to specify basic parameters shown before plus the below one:

-arg <argFileName> -out <outputFileName>

=> -arg: to specify the argument xml file that contains the jobs definitions
=> -out: Path and name of the file containing the exported job specifications.

It is important to mention here the syntax to create the argument file:

  • TERMS
    These tags indicate the start and end of the TERMS file. Only criteria that are located between the tags are considered to be part of the argument!
  • TERM
    The TERM tags indicate the start and the end of a group of selection criteria used to specify a calendar or calendars that are to be exported.
  • Relationship between terms
    You can use AND and OR to join two terms.
  • Parameter name
    Parameters name are fixed:
    – DATACENTER
    – JOBNAME
    – CYCLIC
    Many other parameters are possible
  • Relationship between the NAME and the VALUE
    Valid values:
    – EQ
    – NEQ
    – LIKE
  • Value
    The value that correspond to the parameter (DATACENTER, JOBNAME, aso)

defjob

defjob method reads job processing definitions from a plain text input file written in XML format, then imports job processing definitions directly into Control-M/EM database.

To run the defjob utility, you need to specify basic parameters shown before plus the below:

-src <srcFileName> [/a]

=> -src: to specify the imput xml file that contains the folders definitions
=> /a: directs the utility to automatically reset the “Created By” parameter to the current Control-M/EM user when these two values do not match – OPTIONAL

Example: Import jobs from and XML Input file

  1. Create a source file:
    You can imagine the number of parameters and possibilities to create such a file… That’s why I never create it from scratch, usually I export a job similar to the one to be created using exportdefjob method, then I update its content. As you know, each job processing definition has a JOBISN field that contains a job ID number. In fact, the JOBISN is the JOB_ID field in the def_job folder, many jobs can have the same JOBISN number but it should be unique in each Folder.
    So, here we need to be careful guys! You can use the Control-M Workload Automation folder manager to determine the job ID. Double-click the required folder to get the job list, including the jobs IDs.
    If a job that is being imported contains a JOBISN number that already exists in the Control-M/EM database, defjob overwrites the existing job processing definition with the new job processing definition. If a JOBISN value is not specified, defjob imports the job processing definition as a new job.
    In my case I will simply use the file exported using the exportdefjob method.
  2. Create password file credentials.txt:
    user=emuser
    password=MyEmUserPass
    
  3. Execute the command:
    emdef defjob -pf <passwordFile> -s <guiServerName> -src <srcFileName>

    Which correspond to:

    emdef defjob -pf credentials.txt -s guiserverhostname -src JobsExport.xml 

copydefjob

The copydefjob parameter of the emdef utility creates a new job definition in the Control-M/EM database that is similar to a specified existing definition. The original job and the copy must be in different data centers or SMART Folders. To run the copydefjob utility, see Running the copydefjob utility. Multiple jobs can be selected and copied using the * wildcard character.

To run the copydefjob utility, you need to specify basic parameters shown before plus the below:

-arg <argFileName> [/a]

=> -arg: To specify the argument xml file that contains the jobs definitions
=> /a: directs the utility to automatically reset the “Created By” parameter to the current Control-M/EM user when these two values do not match – OPTIONAL

Below I listed some argument files example to better understand the content:
Example 1: Copy job processing definitions in the FOL1 if FOLDER_DSN is Lib1 and JOBNAME is Job1. In the copy, the FOLDER_DSN value is changed to Lib1_COPY.

<COPYJOB>
	<FOLDER_NAME FROM="FOL1"/>
	<FOLDER_DSN FROM="Lib1" TO="Lib1_COPY"/>
	<JOBNAME FROM="Job1"/>
</COPYJOB>

Example 2: Copies of all jobs in the FOL1 folder to the FOL3 folder.

<COPYJOB>
	<FOLDER_NAME FROM="FOL1" TO="FOL3"/>
</COPYJOB>

Exemple 3: Copy all cyclic jobs in the GR1 group that have a jobname beginning with the string JOBa from FOLDER_DSN Lib1 to FOLDER_DSN Lib1_DEST.

<COPYJOB>
	<FOLDER_DSN FROM="Lib1" TO="Lib1_DEST">
	<SUBAPPLICATION FROM="GR1"/>
	<CYCLIC FROM="1"/>
	<JOBNAME="JOBa*" />
</COPYJOB>

duplicatedefjob

At first sight, the duplicatedefjob method seems doing the same functionality as copydefjob, in fact, the only difference between both is that duplicatedefjob makes a copy of an existing job definition in the same data center and Folder which is not the case of copydefjob.
The rest remain the same between for both methods.

deldefjob

The deldefjob utility deletes specified job processing definitions in the Control-M/EM database. Like previous methods, this one take as entry a arguments XML file with parameters, all jobs that match the arguments file are deleted when the method is executed.

I am not going to explain in details all methods, I think the principle is crystal clear now on what is the benefit and how it can be done. It’s your turn to try if it is not already done, anyway don’t hesitate to share with us your experience 🙂

Cet article Control-M/EM – emdef utility – Jobs est apparu en premier sur Blog dbi services.

odacli create-database extremely slow on ODA X8-2 with 19.5

$
0
0

Introduction

ODA X8-2, in the S, M or HA flavour, is the new database appliance from Oracle. But as a brand new product, you can experience troubles due to the lack of maturity. This time, it concerns database creation that can be extremely slow on these servers if you are using ODA 19.5.

Deployment, appliance creation and core configuration

X8-2 reimaging is fine, and appliance creation is done without any problem on X8-2. Don’t hesitate to create a DBTEST database during appliance creation, it’s a good practise to check if everything is OK after deployment. Once appliance creation is done you should find these pmon processes running on the system:

[root@oda-cr-test ~]# ps -ef | grep pmon
oracle 17600 1 0 11:49 ? 00:00:00 asm_pmon_+ASM1
oracle 20604 1 0 11:50 ? 00:00:00 apx_pmon_+APX1
oracle 21651 1 0 11:50 ? 00:00:00 ora_pmon_DBTEST
root 81984 37658 0 13:44 pts/0 00:00:00 grep --color=auto pmon

If you deployed your ODA for Enterprise Edition, you should now apply the core configuration, understand disabling the unlicensed cores. It’s better to do that immediatly after deployment to avoid using cores you didn’t pay for. One EE license is 2 cores, 2 licenses is 4 cores, etc:

odacli update-cpucore -c 2

Core configuration is done within a few seconds, you could check with this command:

odacli describe-cpucore
Node Cores Modified Job Status
----- ------ ------------------------------ ---------------
0 2 January 27, 2020 9:56:37 AM CET Configured

Starting from now, you can create your own databases.

Database creation: extremely slow and eventually failing

Without additional steps you can create your very first DB. I’m using basic shape for all my databases, as I fine tune everything later using a SQL script (SGA and PGA targets, undo_retention, archive_lag_target, redolog configuration, options, etc):

[root@oda-cr-test ~]# odacli list-dbhomes
ID Name DB Version Home Location Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
ae442886-6bcc-497c-8e16-2e8b4e55157e OraDB19000_home1 19.5.0.0.191015 /u01/app/oracle/product/19.0.0.0/dbhome_1 Configured
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUS -dh 'ae442886-6bcc-497c-8e16-2e8b4e55157e' -n DBSUS -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm

ASM is a better solution than acfs for me: no need to have “real” filesystems and optimal storage usage.

Creating the database lasts normally less than 10 minutes, but not this time:


odacli describe-job -i 94726b11-ea0c-46b0-ab55-6d709ef747d3
Job details
----------------------------------------------------------------
ID: 94726b11-ea0c-46b0-ab55-6d709ef747d3
Description: Database service creation with db name: DBSUS
Status: Failure
Created: February 13, 2020 11:24:53 AM CET
Message: DCS-10001:Internal error encountered: Fail to create User tablespace .
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Database Service creation February 13, 2020 11:24:53 AM CET February 13, 2020 6:23:16 PM CET Failure
Database Service creation February 13, 2020 11:24:53 AM CET February 13, 2020 6:23:16 PM CET Failure
Setting up ssh equivalance February 13, 2020 11:26:26 AM CET February 13, 2020 11:26:36 AM CET Success
Database Service creation February 13, 2020 11:26:36 AM CET February 13, 2020 6:23:11 PM CET Success
Database Creation February 13, 2020 11:26:36 AM CET February 13, 2020 6:21:37 PM CET Success
Change permission for xdb wallet files February 13, 2020 6:21:37 PM CET February 13, 2020 6:21:38 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 13, 2020 6:21:38 PM CET February 13, 2020 6:22:06 PM CET Success
SqlPatch upgrade February 13, 2020 6:23:02 PM CET February 13, 2020 6:23:05 PM CET Success
Running dbms_stats init_package February 13, 2020 6:23:05 PM CET February 13, 2020 6:23:07 PM CET Success
updating the Database version February 13, 2020 6:23:07 PM CET February 13, 2020 6:23:11 PM CET Success
create Users tablespace February 13, 2020 6:23:11 PM CET February 13, 2020 6:23:16 PM CET Failure
Creating Users tablespace February 13, 2020 6:23:11 PM CET February 13, 2020 6:23:16 PM CET Failure

I took 7 hours and it finished with a failure!

Failure can be slightly different, for example :
DCS-10001:Internal error encountered: configure snapshot control file for databaseDBSUS.

Troubleshooting

Creation of the DBTEST database was fine at appliance creation. So what changed between deployment and database creation? The cpu-core configuration. Something is probably wrong with this core configuration.

dbca log is not very interesting regarding our problem:

cat /u01/app/oracle/cfgtoollogs/dbca/DBSUS/DBSUS.log
[ 2020-02-13 11:37:46.492 CET ] Prepare for db operation
DBCA_PROGRESS : 10%
[ 2020-02-13 11:38:22.441 CET ] Registering database with Oracle Restart
DBCA_PROGRESS : 14%
[ 2020-02-13 11:38:51.365 CET ] Copying database files
DBCA_PROGRESS : 43%
[ 2020-02-13 12:09:44.044 CET ] Creating and starting Oracle instance
DBCA_PROGRESS : 45%
DBCA_PROGRESS : 49%
DBCA_PROGRESS : 53%
DBCA_PROGRESS : 56%
DBCA_PROGRESS : 62%
[ 2020-02-13 15:44:47.163 CET ] Completing Database Creation
DBCA_PROGRESS : 68%
[ 2020-02-13 17:48:28.024 CET ] [WARNING] ORA-13516: AWR Operation failed: AWR Schema not initialized
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 356
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 389
ORA-06512: at line 1
DBCA_PROGRESS : 71%
[ 2020-02-13 18:20:49.119 CET ] Executing Post Configuration Actions
DBCA_PROGRESS : 100%
[ 2020-02-13 18:20:49.123 CET ] Database creation complete. For details check the logfiles at:
/u01/app/oracle/cfgtoollogs/dbca/DBSUS.
Database Information:
Global Database Name:DBSUS.salt.ch
System Identifier(SID):DBSUS

Let’s check the ASM alert.log:

tail /u01/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
2020-02-13T14:35:24.880987+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 44 secs.
2020-02-13T14:35:24.881028+01:00
LGWR (ospid: 17755) is hung in an acceptable location (inwait 0x1.ffff).
2020-02-13T14:35:30.039917+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 59 secs.
2020-02-13T14:35:40.280979+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 75 secs.
2020-02-13T14:37:13.180477+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 39 secs.
2020-02-13T14:37:13.180536+01:00
LGWR (ospid: 17755) is hung in an acceptable location (inwait 0x1.ffff).

There is a problem with the disks.

What the system tells us?

tail /var/log/messages
...
Feb 13 14:34:49 oda-cr-test kernel: nvme nvme1: I/O 615 QID 1 timeout, completion polled
Feb 13 14:35:09 oda-cr-test kernel: nvme nvme0: I/O 348 QID 1 timeout, completion polled
Feb 13 14:35:12 oda-cr-test kernel: nvme nvme1: I/O 623 QID 1 timeout, completion polled
Feb 13 14:35:12 oda-cr-test kernel: nvme nvme1: I/O 624 QID 1 timeout, completion polled
Feb 13 14:35:16 oda-cr-test kernel: nvme nvme0: I/O 349 QID 1 timeout, completion polled
Feb 13 14:35:40 oda-cr-test kernel: nvme nvme0: I/O 348 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 349 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 351 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 352 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 353 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 354 QID 1 timeout, completion polled
...

Let’s try to create a database with more cores:

odacli update-cpucore -c 16
reboot
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUP -dh '2d147842-4f42-468a-93c9-112ce9c23ee7' -n DBSUP -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm
odacli describe-job -i 204fcd53-a9f1-416e-953b-c50448207fc1

Job details
----------------------------------------------------------------
ID: 204fcd53-a9f1-416e-953b-c50448207fc1
Description: Database service creation with db name: DBSUP
Status: Success
Created: February 7, 2020 3:41:54 PM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance February 7, 2020 3:41:55 PM CET February 7, 2020 3:41:55 PM CET Success
Database Service creation February 7, 2020 3:41:56 PM CET February 7, 2020 3:50:07 PM CET Success
Database Creation February 7, 2020 3:41:56 PM CET February 7, 2020 3:48:16 PM CET Success
Change permission for xdb wallet files February 7, 2020 3:48:16 PM CET February 7, 2020 3:48:16 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 7, 2020 3:48:16 PM CET February 7, 2020 3:48:19 PM CET Success
SqlPatch upgrade February 7, 2020 3:49:52 PM CET February 7, 2020 3:50:01 PM CET Success
Running dbms_stats init_package February 7, 2020 3:50:01 PM CET February 7, 2020 3:50:05 PM CET Success
updating the Database version February 7, 2020 3:50:05 PM CET February 7, 2020 3:50:07 PM CET Success
create Users tablespace February 7, 2020 3:50:07 PM CET February 7, 2020 3:50:12 PM CET Success
Clear all listeners from Databse {8a6b0534-26be-4a9a-90fd-f2167f57fded} February 7, 2020 3:50:12 PM CET February 7, 2020 3:50:14 PM CET Success

Success in 9 minutes.

odacli update-cpucore -c 8 --force
reboot
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUQ -dh '2d147842-4f42-468a-93c9-112ce9c23ee7' -n DBSUQ -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm
odacli describe-job -i 210e1bd7-fe87-4d60-adce-84b121448c2b

Job details
----------------------------------------------------------------
ID: 210e1bd7-fe87-4d60-adce-84b121448c2b
Description: Database service creation with db name: DBSUQ
Status: Success
Created: February 7, 2020 4:05:48 PM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance February 7, 2020 4:05:49 PM CET February 7, 2020 4:05:49 PM CET Success
Database Service creation February 7, 2020 4:05:49 PM CET February 7, 2020 5:53:09 PM CET Success
Database Creation February 7, 2020 4:05:49 PM CET February 7, 2020 5:46:50 PM CET Success
Change permission for xdb wallet files February 7, 2020 5:46:50 PM CET February 7, 2020 5:46:50 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 7, 2020 5:46:50 PM CET February 7, 2020 5:48:55 PM CET Success
SqlPatch upgrade February 7, 2020 5:50:28 PM CET February 7, 2020 5:53:02 PM CET Success
Running dbms_stats init_package February 7, 2020 5:53:02 PM CET February 7, 2020 5:53:06 PM CET Success
updating the Database version February 7, 2020 5:53:06 PM CET February 7, 2020 5:53:09 PM CET Success
create Users tablespace February 7, 2020 5:53:09 PM CET February 7, 2020 5:53:45 PM CET Success
Clear all listeners from Databse {f0900b63-baf8-4896-8572-a4120770a362} February 7, 2020 5:53:45 PM CET February 7, 2020 5:53:47 PM CET Success

Success in 1h50.


odacli update-cpucore -c 4 --force
reboot
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUR -dh '2d147842-4f42-468a-93c9-112ce9c23ee7' -n DBSUR -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm
odacli describe-job -i 1931b118-3407-413b-babc-ff9a832fab59

Job details
----------------------------------------------------------------
ID: 1931b118-3407-413b-babc-ff9a832fab59
Description: Database service creation with db name: DBSUR
Status: Success
Created: February 7, 2020 6:25:37 PM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance February 7, 2020 6:25:38 PM CET February 7, 2020 6:25:38 PM CET Success
Database Service creation February 7, 2020 6:25:38 PM CET February 7, 2020 10:09:17 PM CET Success
Database Creation February 7, 2020 6:25:38 PM CET February 7, 2020 9:30:40 PM CET Success
Change permission for xdb wallet files February 7, 2020 9:30:40 PM CET February 7, 2020 9:30:40 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 7, 2020 9:30:40 PM CET February 7, 2020 9:35:21 PM CET Success
SqlPatch upgrade February 7, 2020 9:51:49 PM CET February 7, 2020 10:08:17 PM CET Success
Running dbms_stats init_package February 7, 2020 10:08:17 PM CET February 7, 2020 10:09:14 PM CET Success
updating the Database version February 7, 2020 10:09:14 PM CET February 7, 2020 10:09:17 PM CET Success
create Users tablespace February 7, 2020 10:09:17 PM CET February 7, 2020 10:39:01 PM CET Success
Clear all listeners from Databse {8cb507cf-3c84-4ad9-8302-844005965f6b} February 7, 2020 10:39:01 PM CET February 7, 2020 10:41:49 PM CET Success

Success in 4h15.

That’s it. Decreasing the cores dramatically decreases the I/O performance and makes our ODA unusable.

And this problem doesn’t limit to database creation: don’t expect to run correctly a database with this bug. A small datafile creation will take long minutes.

Are you concerned?

This problem does not concern everybody. If you use Standard Edition 2 databases, you don’t need to decrease the cores on your ODA. So you won’t experience this problem. If you have enough Enterprise Edition licenses (at least to enable half of the total cores), you also won’t have this problem. This problem only impacts those who have limited number of licenses.

All my tests were done with ODA 19.5, so it’s probably limited to this specific version. ODA X8.2 with 18.7 shouldn’t have this bug.

Is there a workaround?

For now, there is no workaround. But it seems that an updated Linux kernel could solve the problem. Current kernel provided with 19.5 is 4.14.35-1902.5.2.el7uek.x86_64. Oracle will probably provide something soon.

Conclusion

One of the main advantages of ODA is the core configuration to fit the license. But for now it doesn’t work out of the box with ODA X8-2 and 19.5. If you can keep 11g/12c/18c, waiting for release 19.6 is probably better.

Cet article odacli create-database extremely slow on ODA X8-2 with 19.5 est apparu en premier sur Blog dbi services.

Viewing all 1431 articles
Browse latest View live