Creating and Storing Geo Embeddings in Geobase
This tutorial demonstrates how to create geospatial embeddings with data from OpenStreetMap (OSM) and store them in your geobase project database. We will use the following Python libraries: srai
(version 0.7.5
), geopandas
, and sqlalchemy
.
What are Geospatial Embeddings?
Geospatial embeddings are vector representations of geographical regions that capture the spatial relationships and characteristics of the features present within those regions. These embeddings can be used for various applications
How are geospatial embeddings created?
The creation of geospatial embeddings involves several steps:
- select a geographical area of interest
- load OpenStreetMap (OSM) data for that area
- regionalize the area into hexagonal regions
- join the regions with the OSM features
- create embeddings for the regions based on the features
- store the embeddings in a geobase database
This is a high-level overview of the process:
In the rest of this tutorial, we will walk through each of these steps in detail.
Installing Required Packages
To begin, you need to install several Python packages. These packages include libraries for working with geospatial data, embedding models, and Geobase.
!pip install srai[all]
!pip install folium matplotlib mapclassify
!pip install pgvector
!pip install geoalchemy2
Import Necessary Libraries
from srai.embedders import CountEmbedder
from srai.joiners import IntersectionJoiner
from srai.loaders import OSMOnlineLoader
from srai.plotting import plot_regions, plot_numeric_data
from srai.regionalizers import H3Regionalizer, geocode_to_region_gdf
import pandas as pd
from sqlalchemy import create_engine, Table, Column, Integer, Float, MetaData
import geopandas as gpd
Install geoalchemy2 for working with geospatial data in SQL databases:
!pip install geoalchemy2
Load OSM Data and Regionalize
- Load OSM Data: Use OSMOnlineLoader to load features for a specific area and query.
- Regionalize Data: Use H3Regionalizer to divide the area into hexagonal regions.
loader = OSMOnlineLoader()
regionalizer = H3Regionalizer(resolution=9)
joiner = IntersectionJoiner()
query = {"amenity": "bicycle_parking"}
area = geocode_to_region_gdf("Malmö, Sweden")
features = loader.load(area, query)
regions = regionalizer.transform(area)
joint = joiner.transform(regions, features)
Create Embeddings
Use CountEmbedder
to create embeddings for the regions based on the loaded features.
embedder = CountEmbedder()
embeddings = embedder.transform(regions, features, joint)
Visualize the Results
Use srai.plotting
to visualize the regions and the embeddings.
folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
plot_numeric_data(regions, "amenity_bicycle_parking", embeddings, map=folium_map)
Store Embeddings in Geobase
For this you will need to have an account with https://geobase.app and create a project with pgvector
enabled.
- Create a Connection to the Database: Use sqlalchemy to create an engine for the geobase database.
- Prepare Data for Storage: Convert the embeddings to a DataFrame and then to a GeoDataFrame.
- Define the Table Schema: Define the metadata and table structure.
- Store Data: Use the to_postgis method to store the GeoDataFrame in the geobase.
# Create a connection to the database
engine = create_engine('postgresql://username:password@host_address:port/database')
We need to create the pgvector extension in the database before storing the embeddings.
add_pgvector_extension = """
BEGIN;
CREATE EXTENSION IF NOT EXISTS pgvector WITH SCHEMA extensions;
COMMIT;
"""
with engine.connect() as connection:
connection.execute(text(add_pgvector_extension))
connection.commit()
You can also do it via the geobase studio dashboard by going to the Database > Extensions
tab and enabling the pgvector
extension.
# Example of DataFrame structure
embeddings_df = pd.DataFrame({
'region': regions['geometry'], # Assuming regions DataFrame contains geometries
'embedding': embeddings['amenity_bicycle_parking'] # Assuming embeddings DataFrame contains embeddings
})
# Define the table metadata
metadata = MetaData()
# Define the table
malmo_embeddings = Table(
'malmo_embeddings', metadata,
Column('id', Integer, primary_key=True),
Column('region', Geometry('GEOMETRY', srid=4326)),
Column('embedding', Float, nullable=False)
)
metadata.create_all(engine)
# Convert embeddings_df to GeoDataFrame
embeddings_gdf = gpd.GeoDataFrame(embeddings_df, geometry='region')
# Store the GeoDataFrame in geobase
embeddings_gdf.to_postgis('malmo_embeddings', engine, if_exists='replace')
# Verify data storage
gdf = gpd.read_postgis("malmo_embeddings", engine, geom_col='region')