This article is part of a series that includes SQLite, Postgresql, Firebird, Oracle RDBMS, Microsoft SQL Server, HSQLDB, Excel, and MariaDB. The goal is to set up a self-standing environment for testing an ODBC extension for gawk presented here to be completed. Refer to SQLite for installing the required ODBC Driver Manager.
The test system is a debian v11 (bullseye).
MongoDB is a noSQL database but the ODBC API will hide this fact and allow to access its data using SQL statements. This is to show that with the appropriate ODBC drivers doing the translation, any data source can be accessed using SQL. Not all native operations are rendered correctly of course (notably SELECT’s JOIN clauses, see below) but the main ones, the CRUD, such as INSERT, SELECT, UPDATE and DELETE are all available.
As root, install MongoDB Community Edition and its command-line tool from the official package repositories:
# apt install mongodb # apt install mongodb-mongosh
Follow the instructions here to configure and start mongodb.
A systemd service has been created. Still as root, start the service as shown below:
# ulimit -n 64000 # chown -R mongodb:mongodb /var/lib/mongodb # systemctl stop mongod # systemctl start mongod # systemctl status mongod ● mongod.service - MongoDB Database Server Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled) Active: active (running) since Tue 2021-09-21 09:27:47 CEST; 3s ago Docs: https://docs.mongodb.org/manual Main PID: 38200 (mongod) Memory: 9.8M CPU: 26ms CGroup: /system.slice/mongod.service └─38200 /usr/bin/mongod --config /etc/mongod.conf
Check the MongoDB server availability by connecting to the server through the mongosh tool documented here:
# mongosh Current Mongosh Log ID: 6149ba4e6d2a951c4e869dac Connecting to: mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000 Using MongoDB: 5.0.3 Using Mongosh: 1.0.6 test> quit
The data sample site does not provide scripts for creating and populating MongoDB collections. There are 2 ways to work around this:
1. use isql and ODBC to feed SQL statements to MongoDB;
2. translate SQL statements to javascripts ones suitable for MongoDB;
At this stage, ODBC drivers are not installed yet and therefore isql cannot work so we will select the 2nd alternative. The script below will do just that. As a Mongo database is schema-less, there is no need to create collections beforehand, they will be implicitely created when populated with documents. Any sample populating script will do so we’ll take the one for mysql here and save it to the file data4mongodb.sql. The script pop-mongodb.awk will convert SQL insert statements into MongoDB statements. Here it is:
# format of the input lines, e.g: # INSERT INTO regions(region_id,region_name) VALUES (1,'Europe'); # or: # INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (2500,'Magdalen Centre, The Oxford Science Park','OX9 9ZB','Oxford','Oxford','UK'); # Output: # db.regions.insert({ # region_id: 1, # region_name: "Europe" # }) # or: # db.locations.insert({ # location_id: 2500, # street_address: "Magdalen Centre, The Oxford Science Park", # postal_code: "OX9 9ZB", # city: "Oxford", # state_province: "Oxford", # country_id: "UK" # }) # Usage: # gawk -v Apostrophe="'" -f pop-mongodb.awk data4mongodb.sql > data4mongodb.json BEGIN { ApoRE="(([^,]+)|(" Apostrophe "([^" Apostrophe "]+)" Apostrophe ")),?" # i.e. (([^,]+)|('([^']+)')),? print "use sampledb" } { if (!$0 || match($0, /^\/\*/)) next match($0, /INSERT INTO ([^(]+)\(([^)]+)\)/, stmt) table_name = stmt[1] if (!bemptied[table_name]) { print "db." table_name ".deleteMany({})" bemptied[table_name] = 1 } nb_columns = split(stmt[2], columns, ",") nb = match($0, /VALUES \(([^)]+)\)/, stmt) S = stmt[1] nb_values = 0 while (match(S, ApoRE, res)) { values[++nb_values] = res[1] S = substr(S, RLENGTH + 1) } print "db." table_name ".insert({" for (i = 1; i <= nb_columns; i++) { if ("NULL" == values[i]) values[i] = "null" gsub(Apostrophe, "\"", values[i]) print columns[i] ": " values[i] (i < nb_columns ? "," : "") } print "})" printf "\n" }
Invoke it:
gawk -v Apostrophe="'" -f pop-mongodb.awk data4mongodb.sql > data4mongodb.json
Example of input:
INSERT INTO regions(region_id,region_name) VALUES (1,'Europe'); ... INSERT INTO countries(country_id,country_name,region_id) VALUES ('AR','Argentina',2); ... INSERT INTO locations(location_id,street_address,postal_code,city,state_province,country_id) VALUES (1400,'2014 Jabberwocky Rd','26192','Southlake','Texas','US'); ... INSERT INTO jobs(job_id,job_title,min_salary,max_salary) VALUES (1,'Public Accountant',4200.00,9000.00); ... INSERT INTO departments(department_id,department_name,location_id) VALUES (1,'Administration',1700); ... INSERT INTO employees(employee_id,first_name,last_name,email,phone_number,hire_date,job_id,salary,manager_id,department_id) VALUES (100,'Steven','King','steven.king@sqltutorial.org','515.123.4567','1987-06-17',4,24000.00,NULL,9); ... INSERT INTO dependents(dependent_id,first_name,last_name,relationship,employee_id) VALUES (1,'Penelope','Gietz','Child',206); ...
and their corresponding generated json statements:
use sampledb db.regions.deleteMany({}) db.regions.insert({ region_id: 1, region_name: "Europe", }) ... db.countries.deleteMany({}) db.countries.insert({ country_id: "AR", country_name: "Argentina", region_id: 2, }) ... db.locations.deleteMany({}) db.locations.insert({ location_id: 1400, street_address: "2014 Jabberwocky Rd", postal_code: "26192", city: "Southlake", state_province: "Texas", country_id: "US", }) ... db.jobs.deleteMany({}) db.jobs.insert({ job_id: 1, job_title: "Public Accountant", min_salary: 4200.00, max_salary: 9000.00, }) ... db.departments.deleteMany({}) db.departments.insert({ department_id: 1, department_name: "Administration", location_id: 1700, }) ... db.employees.deleteMany({}) db.employees.insert({ employee_id: 100, first_name: "Steven", last_name: "King", email: "steven.king@sqltutorial.org", phone_number: "515.123.4567", hire_date: "1987-06-17", job_id: 4, salary: 24000.00, manager_id: null, department_id: 9, }) ... db.dependents.deleteMany({}) db.dependents.insert({ dependent_id: 1, first_name: "Penelope", last_name: "Gietz", relationship: "Child", employee_id: 206, })
To be sure the populating step is idempotent, the collections are emptied each time the script is run.
From mongosh, populate the tables as shown:
mongosh < data4mongodb.isql
Back in mongosh, check the data:
mongosh show databases admin 41 kB config 111 kB local 81.9 kB sampledb 516 kB test> use sampledb switched to db sampledb sampledb> show collections countries departments dependents employees jobs locations regions sampledb> db.regions.find() [ { _id: ObjectId("616eef8e230e4e4893edd45f"), region_id: 1, region_name: 'Europe' }, ... sampledb> db.countries.find() [ { _id: ObjectId("616eef8f230e4e4893edd463"), country_id: 'AR', country_name: 'Argentina', region_id: 2 }, ... sampledb> db.locations.find() [ { _id: ObjectId("616eef91230e4e4893edd47c"), location_id: 1400, street_address: '2014 Jabberwocky Rd', postal_code: '26192', city: 'Southlake', state_province: 'Texas', country_id: 'US' }, ... sampledb> db.jobs.find() [ { _id: ObjectId("616eef92230e4e4893edd483"), job_id: 1, job_title: 'Public Accountant', min_salary: 4200, max_salary: 9000 }, ... sampledb> db.departments.find() [ { _id: ObjectId("616eef94230e4e4893edd496"), department_id: 1, department_name: 'Administration', location_id: 1700 }, ... sampledb> db.employees.find() [ { _id: ObjectId("616eef95230e4e4893edd4a1"), employee_id: 100, first_name: 'Steven', last_name: 'King', email: 'steven.king@sqltutorial.org', phone_number: '515.123.4567', hire_date: '1987-06-17', job_id: 4, salary: 24000, manager_id: null, department_id: 9 }, ... sampledb> db.dependents.find() [ { _id: ObjectId("616eef9c230e4e4893edd4c9"), dependent_id: 1, first_name: 'Penelope', last_name: 'Gietz', relationship: 'Child', employee_id: 206 }, ... sampledb> quit
The MongoDB ODBC drivers are available here. They are a modified version of MySQL ODBC driver. However, they don’t work on my test machine running Debian Linux v11 because of a missing openssl v1.0.2 library, which was predictable because those drivers are several years old and my test machine runs the latest Debian linux. For this reason, a commercial replacement from Devart has been installed; it comes with a one-month evaluation period. Once registered, it can be downloaded and installed as root as follows:
# wget https://www.devart.com/odbc/mongodb/devartodbcmongo_amd64.deb # apt install /home/debian/Downloads/devartodbcmongo_amd64.deb
Check the system-wide installed drivers:
# odbcinst -q -d ... [MongoDB Unicode] [MongoDB ANSI] ... [Devart ODBC Driver for MongoDB] # odbcinst -q -d -n "Devart ODBC Driver for MongoDB" [Devart ODBC Driver for MongoDB] Driver=/usr/local/lib/libdevartodbcmongo.so
Check the system-wide DSN in /etc/odbc.ini:
# odbcinst -q -s ... [DEVART_MONGODB] # odbcinst -q -s -n "DEVART_MONGODB" [DEVART_MONGODB] Description=My MongoDB sample database Driver=Devart ODBC Driver for MongoDB Data Source= Port=27017 Database=sampledb User ID= Password= Client Library= BSON Library= Additional Servers= Connection Options=
Copy the newly inserted DSN into debian’s ~/.odbc.ini file:
$ vi ~/.odbc.ini ... [DEVART_MONGODB] Description=My MongoDB sample database Driver=Devart ODBC Driver for MongoDB Data Source= Port=27017 Database=sampledb User ID= Password= Client Library= BSON Library= Additional Servers= Connection Options=
As debian, check its DSN so far:
$ odbcinst -q -s ... [DEVART_MONGODB] ... $ odbcinst -q -s -n DEVART_MONGODB [DEVART_MONGODB] Description=My MongoDB sample database Driver=Devart ODBC Driver for MongoDB Data Source= Port=27017 Database=sampledb User ID= Password= Client Library= BSON Library= Additional Servers= Connection Options=
Try a connection to the MongoDB database via ODBC:
isql -v DEVART_MONGODB +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | quit | | | +---------------------------------------+ SQL> help +-----------+-------------+-------------+------------+---------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | +-----------+-------------+-------------+------------+---------+ | sampledb | | countries | TABLE | | | sampledb | | dependents | TABLE | | | sampledb | | regions | TABLE | | | sampledb | | locations | TABLE | | | sampledb | | departments | TABLE | | | sampledb | | jobs | TABLE | | | sampledb | | employees | TABLE | | +-----------+-------------+-------------+------------+---------+ SQLRowCount returns -1 7 rows fetched
It looks good. Run the test query now:
SQL> SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries c LEFT JOIN locations l ON l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN'); Output: +--------------+-----------------+-----------------+----------------+------+ | country_name | country_id | country_id_1 | street_address | city | +--------------+-----------------+-----------------+----------------+------+ | China | CN | | | | +--------------+-----------------+-----------------+----------------+------+ SQLRowCount returns -1 1 rows fetched
Only one row is returned because the left join is not implemented in the combo MongoDB/ODBC driver. In effect, joins in relational database are a necessity due to the data normalization; as MongoDB does not care about data duplication and works with complete, self-standing documents, no joins are necessary although inner queries can be simulated with aggregate() and lookup() as illustrated by the following equivalent query:
db.countries.aggregate([ { $match:{$or:[{"country_id" : "US"}, {"country_id" : "UK"}, {"country_id" : "CN"}]} }, { $lookup: { from: "locations", localField: "country_id", foreignField: "country_id", as: "r" } } ])
with its result:
[ { _id: ObjectId("616eef90230e4e4893edd469"), country_id: 'CN', country_name: 'China', region_id: 3, r: [] }, { _id: ObjectId("616eef91230e4e4893edd478"), country_id: 'UK', country_name: 'United Kingdom', region_id: 1, r: [ { _id: ObjectId("616eef92230e4e4893edd480"), location_id: 2400, street_address: '8204 Arthur St', postal_code: null, city: 'London', state_province: null, country_id: 'UK' }, { _id: ObjectId("616eef92230e4e4893edd481"), location_id: 2500, street_address: 'Magdalen Centre, The Oxford Science Park', postal_code: 'OX9 9ZB', city: 'Oxford', state_province: 'Oxford', country_id: 'UK' } ] }, { _id: ObjectId("616eef91230e4e4893edd479"), country_id: 'US', country_name: 'United States of America', region_id: 2, r: [ { _id: ObjectId("616eef91230e4e4893edd47c"), location_id: 1400, street_address: '2014 Jabberwocky Rd', postal_code: '26192', city: 'Southlake', state_province: 'Texas', country_id: 'US' }, { _id: ObjectId("616eef91230e4e4893edd47d"), location_id: 1500, street_address: '2011 Interiors Blvd', postal_code: '99236', city: 'South San Francisco', state_province: 'California', country_id: 'US' }, { _id: ObjectId("616eef91230e4e4893edd47e"), location_id: 1700, street_address: '2004 Charade Rd', postal_code: '98199', city: 'Seattle', state_province: 'Washington', country_id: 'US' } ] } ]
To easy up the comparaison, we save that output to the file mongodb.out and run the following commands to reformat it into a tabular presentation.
First, convert mongodb.out to correct json syntax:
$ gawk -v Apo="'" '{ gsub(Apo, "\"", $0) if (match($1, /^_id/)) next print gensub(/^( +)([^:]+)(:.+$)/, "\\1\"\\2\"\\3", "g", $0) }' mongodb.out > mongodb.json
Now, save the following python script into the python script mongodb2tab.py:
$ cat - <<eop mongodb2tab.py import os import json with open("./mongodb.json") as json_file: mongodb_out = json.load(json_file) widths = [25, 10, 10, 42, 10] print(f"{'country_name': <{widths[0]}} {'country_id': <{widths[1]}} {'country_id': <{widths[2]}} \ {'street_address': <{widths[3]}} {'city': <{widths[4]}}") for row in mongodb_out: if row['r']: for l in row['r']: print(f"{row['country_name']: <{widths[0]}} {row['country_id']: <{widths[1]}} {l['country_id']: <{widths[2]}} {l['street_address']: <{widths[3]}} {l['city']: <{widths[4]}}") else: print(f"{row['country_name']: <{widths[0]}} {row['country_id']: <{widths[1]}}") eop
Finally, execute it:
$ python3 mongodb2tab.py Output: country_name country_id country_id street_address city China CN United Kingdom UK UK 8204 Arthur St London United Kingdom UK UK Magdalen Centre, The Oxford Science Park Oxford United States of America US US 2014 Jabberwocky Rd Southlake United States of America US US 2011 Interiors Blvd South San Francisco United States of America US US 2004 Charade Rd Seattle
Which shows that the output of the MongoDb query to simulate the left outer join was correct.
Let’s now test the DSN with pyodbc:
$ python3 import pyodbc cnxn = pyodbc.connect(DSN='mymongodb_Devart') cursor = cnxn.cursor() cursor.execute("""SELECT c.country_name, c.country_id, l.country_id, l.street_address, l.city FROM countries c LEFT JOIN locations l ON l.country_id = c.country_id WHERE c.country_id IN ('US', 'UK', 'CN')""") row = cursor.fetchone() while row: print (row) row = cursor.fetchone() Output: ('China', 'CN', None, None, None)
Here too, there are missing values as expected.
MongoDB is now accessible from ODBC under the debian account, albeit not all SQL statements are fully supported, which is understandable with a NoSQL database.
Instructions for the other data sources can be accessed through the following links:
SQLite
Firebird
HSQLDB
MariaDB
PostgreSQL
Oracle
Microsoft SQLServer for Linux
Excel
Cet article Installing the ODBC drivers for MongoDB est apparu en premier sur Blog dbi services.