GuidesSRAISimple Embeddings

Creating and Storing Geo Embeddings in Geobase

This tutorial demonstrates how to create geospatial embeddings with data from OpenStreetMap (OSM) and store them in your geobase project database. We will use the following Python libraries: srai (version 0.7.5), geopandas, and sqlalchemy.

What are Geospatial Embeddings?

Geospatial embeddings are vector representations of geographical regions that capture the spatial relationships and characteristics of the features present within those regions. These embeddings can be used for various applications

How are geospatial embeddings created?

The creation of geospatial embeddings involves several steps:

  1. select a geographical area of interest
  2. load OpenStreetMap (OSM) data for that area
  3. regionalize the area into hexagonal regions
  4. join the regions with the OSM features
  5. create embeddings for the regions based on the features
  6. store the embeddings in a geobase database

This is a high-level overview of the process:

Hex2Vec Embedder Flow Diagram

In the rest of this tutorial, we will walk through each of these steps in detail.

Installing Required Packages

To begin, you need to install several Python packages. These packages include libraries for working with geospatial data, embedding models, and Geobase.

!pip install srai[all]
!pip install folium matplotlib mapclassify
!pip install pgvector
!pip install geoalchemy2

Import Necessary Libraries

from srai.embedders import CountEmbedder
from srai.joiners import IntersectionJoiner
from srai.loaders import OSMOnlineLoader
from srai.plotting import plot_regions, plot_numeric_data
from srai.regionalizers import H3Regionalizer, geocode_to_region_gdf
import pandas as pd
from sqlalchemy import create_engine, Table, Column, Integer, Float, MetaData
import geopandas as gpd

Install geoalchemy2 for working with geospatial data in SQL databases:

!pip install geoalchemy2

Load OSM Data and Regionalize

  • Load OSM Data: Use OSMOnlineLoader to load features for a specific area and query.
  • Regionalize Data: Use H3Regionalizer to divide the area into hexagonal regions.
loader = OSMOnlineLoader()
regionalizer = H3Regionalizer(resolution=9)
joiner = IntersectionJoiner()
 
query = {"amenity": "bicycle_parking"}
area = geocode_to_region_gdf("Malmö, Sweden")
features = loader.load(area, query)
regions = regionalizer.transform(area)
joint = joiner.transform(regions, features)
 

Create Embeddings

Use CountEmbedder to create embeddings for the regions based on the loaded features.

embedder = CountEmbedder()
embeddings = embedder.transform(regions, features, joint)

Visualize the Results

Use srai.plotting to visualize the regions and the embeddings.

folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
plot_numeric_data(regions, "amenity_bicycle_parking", embeddings, map=folium_map)
 

Store Embeddings in Geobase

For this you will need to have an account with https://geobase.app and create a project with pgvector enabled.

  • Create a Connection to the Database: Use sqlalchemy to create an engine for the geobase database.
  • Prepare Data for Storage: Convert the embeddings to a DataFrame and then to a GeoDataFrame.
  • Define the Table Schema: Define the metadata and table structure.
  • Store Data: Use the to_postgis method to store the GeoDataFrame in the geobase.
# Create a connection to the database
engine = create_engine('postgresql://username:password@host_address:port/database')

We need to create the pgvector extension in the database before storing the embeddings.

add_pgvector_extension = """
BEGIN;
CREATE EXTENSION IF NOT EXISTS pgvector WITH SCHEMA extensions;
COMMIT;
"""
 
with engine.connect() as connection:
    connection.execute(text(add_pgvector_extension))
    connection.commit()

You can also do it via the geobase studio dashboard by going to the Database > Extensions tab and enabling the pgvector extension.

# Example of DataFrame structure
embeddings_df = pd.DataFrame({
    'region': regions['geometry'],  # Assuming regions DataFrame contains geometries
    'embedding': embeddings['amenity_bicycle_parking']  # Assuming embeddings DataFrame contains embeddings
})
 
# Define the table metadata
metadata = MetaData()
 
# Define the table
malmo_embeddings = Table(
    'malmo_embeddings', metadata,
    Column('id', Integer, primary_key=True),
    Column('region', Geometry('GEOMETRY', srid=4326)),
    Column('embedding', Float, nullable=False)
)
 
metadata.create_all(engine)
 
# Convert embeddings_df to GeoDataFrame
embeddings_gdf = gpd.GeoDataFrame(embeddings_df, geometry='region')
 
# Store the GeoDataFrame in geobase
embeddings_gdf.to_postgis('malmo_embeddings', engine, if_exists='replace')
 
# Verify data storage
gdf = gpd.read_postgis("malmo_embeddings", engine, geom_col='region')

Malmo Bicycle Parking Embeddings