GuidesIntegrationsMotherduck

How does Geobase Postgres access MotherDuck

In his guide we will get you started with MotherDuck and how to access it from Geobase Postgres. By the end of it you will be able to:

  1. Run SQL queries on MotherDuck from within Geobase
  2. Query data stored in MotherDuck via Geobase’s REST API from frontend applications
  3. Visualise data stored in MotherDuck using Geobase’s Map Tile Server

By the end you will have the following running:

Geobase

ℹ️

Note: This guide is meant to be followed in the order presented.

What is MotherDuck?

skip if you already know

TLD;R MotherDuck is a cloud-based analytics service built on top of DuckDB.

MotherDuck is a cloud-based analytics service built on top of DuckDB. DuckDB is an open-source embedded OLAP database developed and maintained by the DuckDB community and DuckDB Labs. MotherDuck is an independent organization closely collaborating with DuckDB Labs to build a cloud-based analytics platform based on DuckDB.

Some features of MotherDuck include:

  • Cloud database storage: Manage your DuckDB database directories in the cloud for easy scalability and collaboration.
  • SQL analytics engine: Efficiently run the same SQL queries on data stored locally, in MotherDuck, or in your data lake.
  • Simplified database sharing: Create a shareable data snapshot that your colleagues can easily attach in MotherDuck.
  • Hybrid query execution: Utilize the idle computing power on your laptop in coordination with the cloud to improve speed and reduce costs.
  • Notebook-style UI: Browse data directories, write SQL, filter and sort results, and share data using a web interface.
  • Powerful DuckDB ecosystem: Import, orchestrate, and perform business intelligence with over 25 modern tools.

MotherDuck is suitable for data applications that don’t require petabyte-scale data warehousing and can also query your data in your data lake. MotherDuck can also serve as a backend service for data applications based on DuckDB.

Preparation

TLD;R: Get a MotherDuck account and a service token.

First you need to have a MotherDuck account register MotherDuck is currently free to use, visit the Sign Up page, Choose a method you like to register an account. Then go to settings page to get your service token.

Enable duckdb_fdw in geobase

In geobase go to Database > Extensions > duckdb_fdw and toggle the switch to “Enable” the extension.

Create foreign data wrapper server

In geobase go to SQL Editor and run the following SQL to create a foreign server.

ℹ️

Note: Replace YOUR_MOTHERDUCK_TOKEN in the SQL with the service token copied earlier.

-- Clean up (optional)
-- DROP EXTENSION if exists duckdb_fdw CASCADE;
-- CREATE EXTENSION IF NOT EXISTS duckdb_fdw; WITH SCHEMA extensions CASCADE;
 
-- Create the foreign server
DO $$
DECLARE
  motherduck_token TEXT;
BEGIN
  -- Assign the value of the configuration setting to the variable
  motherduck_token := 'YOUR_MOTHERDUCK_TOKEN';
 
  -- Check if the token matches the JWT pattern
  IF NOT (motherduck_token ~ '^[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+$') THEN
    RAISE EXCEPTION 'motherduck_token variable is not set. Please replace YOUR_MOTHERDUCK_TOKEN with a valid value';
  END IF;
 
  -- Create the foreign server using the token
  EXECUTE format(
    'CREATE SERVER duckdb_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS (database ''md:?motherduck_token=%s&saas_mode=true'')',
    motherduck_token
  );
END $$;
🎉

Woho! You have successfully created a foreign server for MotherDuck. Which means you can run queries on the MotherDuck from with-in geobase! 🚀

Run SQL on MotherDuck

Create a view for easy inspection of which databases and tables are in motherduck.

-- execute sql on duckdb server hosted by motherduck to create a view
SELECT duckdb_execute('duckdb_server',
  'CREATE VIEW IF NOT EXISTS tables_duckdb AS SELECT * FROM information_schema.tables'
);

Now we can import the view we just created into geobase and query it.

-- import tables and views from motherduck public shcema into public schema on geobase
IMPORT FOREIGN SCHEMA public FROM SERVER duckdb_server INTO public;
 
 
-- query the view we just created on motherduck
SELECT * FROM tables_duckdb;

MotherDuck tables are now visible in geobase. 🎉

Import from other motherduck databases

TLD;R: Import tables from other databases in MotherDuck.

MotherDuck comes preloaded with some sample data. This is stored in the sample_data database. Suppose now I want to import all tables in the database sample_data in MotherDuck to geobase. For this we need to update the search path to include the sample_data database and the we can import the tables from the sample_data database which are in various schemas.

-- on motherduck set the search path to the schema you want to import
SELECT duckdb_execute('duckdb_server','set search_path TO ''sample_data,my_db'';');
-- 
IMPORT FOREIGN SCHEMA nyc  FROM SERVER DuckDB_server INTO public;
IMPORT FOREIGN SCHEMA hn   FROM SERVER DuckDB_server INTO public;
IMPORT FOREIGN SCHEMA who  FROM SERVER DuckDB_server INTO public;
SELECT *  FROM information_schema.tables WHERE table_schema='public' AND table_type='FOREIGN';

Query Schema

Here the sample_data and my_db are the schema names in the MotherDuck database.

MotherDuck Schemas

Next let’s query the data in the hacker_news schema.

SELECT
  domain,
  count(*) AS count
FROM (
  SELECT
    (regexp_matches(url, 'http[s]?://([^/]+)/'))[1] AS domain
  FROM hacker_news
  WHERE url IS NOT NULL
) subquery
WHERE domain != ''
GROUP BY domain
ORDER BY count DESC
LIMIT 20;
ℹ️

Note: PostgreSQL does not have a regexp_extract function and needs to be replaced by regexp_matches.

Query Result

Upload data to MotherDuck

We are going to upload the chicago_crime_dataset_pa.parquet table to MotherDuck. File can be downloaded from here.

For this on the MotherDuck UI, click on the ADD FILES button to upload a new dataset. This will create a cell with SQL to import the dataset. Run the cell to import the dataset.

Upload Dataset

IMPORT FOREIGN SCHEMA public LIMIT TO(chicago_crime_dataset_pa) FROM SERVER duckdb_server INTO public;
SELECT * FROM chicago_crime_dataset_pa LIMIT 5;

Map Visualization

For us to visualise the data on a map we need to create a view with a PostGIS column. This allows geobase to automatically detect the geometry column and create an end point for the map tile server.

Create a view with a PostGIS column

Next let’s create a view with a postgis column.

CREATE OR REPLACE VIEW chicago_crime_data_view AS
SELECT
  unique_key,
  date,
  description,
  latitude,
  longitude,
  year,
  h3_7,
  year_half,
  ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)::geometry(Point, 4326) AS geopoint
FROM chicago_crime_dataset_pa
LIMIT 100;

Fronted Map via Tile Server

Now we can access the data via the Geobase Map Tile Server.

Geobase

Query via REST API

You can also use the REST API to query the data stored in MotherDuck.

# TEST API
curl 'https://[GEOBASE_PROJECT_ID].geobase.app/rest/v1/chicago_crime_data_view' \
  -H "apikey: [GEOBASE_ANON_KEY]" \
  -H "Authorization: Bearer [GEOBASE_ANON_KEY]"

Conclusion

Build something amazing with MotherDuck and Geobase! 🐥🚀