This is a basic description of the standard procedure to keep the local PostgreSQL/PostGIS database in sync with OpenStreetMap as data is progressively updated to OSM. The information here provided is general and not necessarily comprehensive, also considering that the main scope of this site is to provide tutorials to set-up a development environment of OpenStreetMap Carto and offer recommendations to edit the style.
After the initial load of a PBF extract (or of the whole “planet”) into the PostgreSQL/PostGIS database, in order to keep data up to date, OpenStreetMap offers minutely, hourly and daily change files in compressed xml format, also called diff files, or replication diffs, or osmChange files, periodically collecting uploaded and closed changesets (where each changeset groups single edits like additions, updates, changes, deletions of features).
A number of tools have been developed to get, analyze and process incremental OpenStreetMap update files into a PostGIS database. Among them, some notable ones are:
For the replication process, we assume that the database is created and that an initial import has already been made through Osm2pgsql, as described in “Get an OpenStreetMap data extract”. (Notice that -s
or --slim
option is needed for the initial import to allow subsequent updates through -a
or --append
.)
The traditional method to keep the tile server and the PostgreSQL/PostGIS database up-to-date with the latest OSM data is based on the Osmosis/osm2pgsql chain.
Osmosis is a general-purpose command-line Java-based OSM data tool which provides many capabilities including the function to replicate OpenStreetMap data in OSC (Open Street Map Change) change file format via periodic syncs. The replication tasks are described in a specific section of the related usage page. This software is maintained in its GitHub repository.
To automate the process of downloading the replication diffs and importing the changes into the PostgreSQL/PostGIS database, Osmosis is generally activated through a scheduled script that links its output to the osm2pgsql input through a pipe; the script can be scheduled via cron.
The following command installs Osmosis from package:
sudo apt-get -y install osmosis
After completing the installation, you can go on with other setup steps.
Alternatively to installing software from package, the following procedure allows compiling Osmosis from sources.
Update Ubuntu and install essential tools:
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y install curl unzip gdal-bin tar wget bzip2 build-essential clang
Install Java JRE, JDK and Gradle:
sudo apt-get -y install default-jre default-jdk gradle
Clone ‘osmosis’ repository:
cd ~/src
git clone https://github.com/openstreetmap/osmosis
Compile Osmosis:
cd osmosis
./gradlew assemble
Test Osmosis following wiki examples:
curl https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf --output ~/data.osm.pbf
package/bin/osmosis --read-pbf ~/data.osm.pbf --node-key-value keyValueList="highway.speed_camera" --write-xml radar.osm
package/bin/osmosis --read-pbf ~/data.osm.pbf --tf accept-ways highway=* --used-node --write-xml highways.osm
Current version of Osmosis when compiled from sources at the time of writing:
package/bin/osmosis --version 2>&1 | grep "Osmosis Version"
INFO: Osmosis Version 0.46-SNAPSHOT
Link the osmosis command to /usr/bin:
sudo ln -s "$PWD"/package/bin/osmosis /usr/bin/osmosis
Final check:
cd
osmosis --version 2>&1 | grep "Osmosis Version"
As Osmosis is a java program, you may need to specify appropriate options to the JVM like the memory usage (e.g., increasing it with the -Xmx
option) and the temporary directory (e.g., something other than /tmp/). You can set for instance the JAVACMD_OPTIONS environment variable like in the following example:
export JAVACMD_OPTIONS="-Xmx2G -Djava.io.tmpdir=/some/other/path/than/tmp/"
You can also add the above command to the .osmosis file in your home directory (or C:\Users<user name>\osmosis.bat with Windows, replacing export
with set
).
Create a temporary directory to be used by Osmosis:
WORKOSM_DIR=/home/tileserver/osmosisworkingdir
sudo mkdir -p $WORKOSM_DIR ; cd $WORKOSM_DIR
This command simplifies a test installation but should be avoided in a production environment to protect security:
sudo chmod a+w .
Two configuration files are needed: state.txt and configuration.txt.
To create a default configuration file for osmosis, first remove any previous existing configuration file:
rm -f configuration.txt
then run the following (e.g., --rrii
option):
osmosis --read-replication-interval-init workingDirectory=$WORKOSM_DIR
A default configuration.txt file and another one named download.lock will be created. download.lock can be ignored (as used by Osmosis to ensure that only one copy is running at a time).
An issue with the past versions of Osmosis is that the created default configuration includes a http baseUrl instead of using https. Fortunately, it can be fixed by manually adjusting the configuration.txt file, substituting http with https; then all updates will use the new url. A correct baseUrl will be: baseUrl=https://planet.openstreetmap.org/...
. The following command can be run to fix the url:
sed -i 's!baseUrl=http://planet.openstreetmap.org/!baseUrl=https://planet.openstreetmap.org/!' configuration.txt
You might also need change the replication period which is specified within the baseUrl. By default, this points to minutely diffs (baseUrl=https://planet.openstreetmap.org/replication/minute
). If you want hourly or daily, you should edit the file so that it references the related replication diffs URL: https://planet.openstreetmap.org/replication/hour/
, or https://planet.openstreetmap.org/replication/day/
.
configuration.txt shall at least include baseUrl and maxInterval. A Java exception occurs in case one of these two is missing (e.g., SEVERE: Thread for task 1-read-replication-interval failed
).
Keeping the data up-to-date can be resource intensive. maxInterval controls how much data Osmosis will download in a single invocation and by default is set to 3600, meaning one hour of changes a time (even if you are using minutely updates). After the download, Osm2pgsql applies them to the database. Depending on the size of the area, on the number of changes and on how complex they are, one run can be immediate, take many seconds, a few minutes, or more than one hour. For testing, it is worthwhile to change the maxInterval value to something lower (e.g., 60 seconds to just download one minute at a time which should only take a few seconds to apply). If you have instead lot of changes to catch up, you can tune it to a higher value (e.g., changing it to maxInterval = 21600, meaning six hours, 86400 for one day, 604800 for one week). Setting to 0 disables this feature.1
The state.txt file contains information about the version (sequenceNumber) and the timestamp of the osm/pbf file and you need to get the state.txt file that corresponds to the dataset you downloaded.
Once the process finishes you can check the state.txt file. The timestamp and sequence number should have changed to reflect the applied update.
Example of command to get the state.txt file:
wget https://planet.openstreetmap.org/replication/hour/000/003/000.state.txt -O "$WORKOSM_DIR/state.txt"
To find the appropriate sequence number by timestamp you can either look through the diff files, or use the Peter Körner’s website tool. You can change the url of the above command with the one returned by this utility or simply create a state.txt file including the whole output.
Alternative command which sets the date directly within the command-line:
wget "https://replicate-sequences.osm.mazdermind.de/?"`date -u +"%Y-%m-%d"`"T00:00:00Z" -O $WORKOSM_DIR/state.txt
Resetting the sequenceNumber to an earlier state can always be done by changing the sequenceNumber entry. Applying a change twice is fine. It just updates the data to the same values it had before.1
Each call to osmosis will compare the local state.txt with the current one on the service.
Once configuration.txt and state.txt are correctly created, a sequence of commands including a processing pipeline can be tested to keep the DB updated with osmosis and osm2pgsql:
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres_007%
cd ~/src/openstreetmap-carto
osmosis --read-replication-interval workingDirectory="${WORKOSM_DIR}" --simplify-change --write-xml-change - | \
osm2pgsql --append -r xml -s -C 300 -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis -H $PGHOST -U $PGUSER -
Its related execution diagram is the following:
configuration.txt | Openstreetmap Changes | |||
state.txt | Osmosis | Osm2pgsql | ||
state.txt | PostgreSQL PostGIS |
To schedule the update procedure, the osmosis pipeline can be packaged in a script and put into a minutely or hourly cron job. It is safe to call osmosis every minute, as it puts a lock on the download.lock file and if the previous run is still executing, the next one will exit immediately without doing anything. You might want to redirect stdout and stderr to either a log file or /dev/null to avoid cron sending out emails every time the script runs.
Create a file named osmosis-update.sh
cd /home/tileserver/osmosisworkingdir
vi osmosis-update.sh
Include the following content:
#!/usr/bin/env bash
test "$1" || exec 2>/dev/null
set -euf -o pipefail
WORKOSM_DIR=/home/tileserver/osmosisworkingdir
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres_007%
cd ~/src/openstreetmap-carto
osmosis --read-replication-interval workingDirectory="${WORKOSM_DIR}" --simplify-change --write-xml-change - | \
osm2pgsql --append -r xml -s -C 300 -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis -H $PGHOST -U $PGUSER -
Save and test it:
chmod +x /home/tileserver/osmosisworkingdir/osmosis-update.sh
/home/tileserver/osmosisworkingdir/osmosis-update.sh display
A sample of cron job is the following:
* * * * * /home/tileserver/osmosisworkingdir/osmosis-update.sh >> /home/tileserver/osmosisworkingdir/osmosis.log 2>&1
Processing the whole planet needs CPU, RAM and disk capacity; the PostgreSQL database grows significantly as updates are applied to it.
A Python script named trim_osc.py and developed by Zverik within his repository of Tools for OSM regional extract support allows to trim the osmChange files to a bounding box or a polygon related to just the area that we are interested in. The Github README describes its usage. It is recommended to increase update interval to 5-10 minutes, so changes accumulate and ways could be filtered more effectively.
To install it:
cd ~/src
git clone https://github.com/zverik/regional
chmod u+x ~/src/regional/trim_osc.py
sudo apt-get install -y python-psycopg2 python-shapely python-lxml
The toolchain to process a trimmed download is the following:
configuration.txt | Openstreetmap Changes | |||||
state.txt | Osmosis | trim_osc.py | Osm2pgsql | |||
state.txt | PostgreSQL PostGIS |
Usage of the script:
trim_osc.py [-h] [-d DBNAME] [--host HOST] [--port PORT] [--user USER]
[--password] [-p POLY] [-b Xmin Ymin Xmax Ymax] [-z] [-v]
osc output
Trim osmChange file to a polygon and a database data
positional arguments:
osc input osc file, "-" for stdin
output output osc file, "-" for stdout
optional arguments:
-h, --help show this help message and exit
-d DBNAME database name
--host HOST database host
--port PORT database port
--user USER user name for db
--password ask for password
-p POLY, --poly POLY osmosis polygon file
-b Xmin Ymin Xmax Ymax, --bbox Xmin Ymin Xmax Ymax
Bounding box
-z, --gzip source and output files are gzipped
-v display debug information
Sample:
~/src/regional/trim_osc.py -d gis -p /path/to/region.poly -z input output
trim_osc.py accepts a two-dimensional bounding box with -b
option and a polygon file with -p
option. The Geofabrik downloads server gives the .poly files they use to generate their country/region extracts. For instance, the .poly file for Liechtenstein that describes the extent of this region can be downloaded with the following command:
cd $WORKOSM_DIR
wget https://download.geofabrik.de/europe/liechtenstein.poly
Then, the related argument to add to osmosis is the following: -p "${WORKOSM_DIR}/liechtenstein.poly"
Script implementing the whole toolchain (we use osmChange as temporary file):
#!/usr/bin/env bash
test "$1" || exec 2>/dev/null
set -euf -o pipefail
WORKOSM_DIR=/home/tileserver/osmosisworkingdir
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres_007%
cd ~/src/openstreetmap-carto
osmosis --read-replication-interval workingDirectory="${WORKOSM_DIR}" --simplify-change --write-xml-change osmChange
~/src/regional/trim_osc.py -d gis --user $PGUSER --host $PGHOST --port $PGPORT --password -p "${WORKOSM_DIR}/liechtenstein.poly" osmChange osmChange
osm2pgsql --append -r xml -s -C 300 -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis -H $PGHOST -U $PGUSER osmChange
rm osmChange
When updating the db, we also need to inform the portal that the tails impacted by changes shall be marked as expired and so re-rendered. The process of identifying which tiles have to be scheduled for re-rendering is complex. Even if tools are available since long time, related methods and algorithms are still under discussions for optimizations and bug fixing (at least at the moment of writing this tutorial). One possibility is to use osm2pgsql with -e
(or --expire-tiles
) and -o
(or --expire-output
) options to generate a list of changed tiles, that so need to be set as expired on the portal. The list can then be passed to render_expired program (from the mod_tile project), the same tool we already mentioned to update tiles after a change in the stylesheet. render_expired consumes the tile list produced by osm2pgsql. Nevertheless, the OSMF servers don’t use osm2pgsql expiry, but instead exploit expire.rb.2. The German tile server uses expiremeta.pl, which is part of Tirex Tile Rendering System.3
We will now update the previous script in order to also manage the tile expiration process by adding the -e$EXPIRY_METAZOOM:$EXPIRY_METAZOOM
and -o "$EXPIRY_FILE.$$"
options to osm2pgsql and including render_expired at the end of the toolchain:
#!/usr/bin/env bash
test "$1" || exec 2>/dev/null
set -euf -o pipefail
WORKOSM_DIR=/home/tileserver/osmosisworkingdir
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres_007%
EXPIRY_MINZOOM=10
EXPIRY_MAXZOOM=18
EXPIRY_METAZOOM=15
EXPIRY_FILE=dirty_tiles
cd ~/src/openstreetmap-carto
osmosis --read-replication-interval workingDirectory="${WORKOSM_DIR}" --simplify-change --write-xml-change osmChange
~/src/regional/trim_osc.py -d gis --user $PGUSER --host $PGHOST --port $PGPORT --password -p "${WORKOSM_DIR}/liechtenstein.poly" osmChange osmChange
osm2pgsql --append -s -e$EXPIRY_METAZOOM:$EXPIRY_METAZOOM -o "$EXPIRY_FILE.$$" -C 300 -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis -H $PGHOST -U $PGUSER osmChange
render_expired --min-zoom=$EXPIRY_MINZOOM --max-zoom=$EXPIRY_MAXZOOM --touch-from=$EXPIRY_MINZOOM -s /var/run/renderd.sock < "$EXPIRY_FILE.$$"
rm osmChange
The revised toolchain is the following:
configuration.txt | osmChange | |||||||
state.txt | Osmosis | trim_osc.py | Osm2pgsql | render_expired | ||||
state.txt | PostgreSQL |
Scripts are available to support the update process. mod_tile includes a helper script named openstreetmap-tiles-update-expire, which performs the following steps:
The script, which for normal operation is invoked without arguments, also provides an initialization option of the replication system when executed with the single argument YYYY-MM-DD.
The date of the planet file obtained through a previously performed osm2pgsql import can be used as argument (the command to get the date of today would be: date -u +"%Y-%m-%dT%H:%M:%SZ"
). The initialization includes the following steps:
Its man Page describes the usage.
Install the script (as well as the auxiliary small script osmosis-db_replag):
wget https://raw.githubusercontent.com/openstreetmap/mod_tile/master/openstreetmap-tiles-update-expire
chmod a+x ./openstreetmap-tiles-update-expire
sudo mv ./openstreetmap-tiles-update-expire /usr/bin
wget https://raw.githubusercontent.com/openstreetmap/mod_tile/master/osmosis-db_replag
chmod a+x ./osmosis-db_replag
sudo mv ./osmosis-db_replag /usr/bin
The openstreetmap-tiles-update-expire script must be edited to modify the following configuration settings:
WORKOSM_DIR=/home/tileserver/osmosisworkingdir
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres_007%
"-C 300 -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis -H $PGHOST -U $PGUSER"
. Add also the full pathname of each openstreetmap-carto file.-e
option to create a tile expiry list) and by render_expired.Notice that the script invokes http://osm.personalwerk.de/replicate-sequences/?"$1"T00:00:00Z
(sources on GitHub) to download state.txt and http://osm.personalwerk.de/replicate-sequences redirects to https://replicate-sequences.osm.mazdermind.de, already mentioned before.
Edit also osmosis-db_replag script and check STATE=/var/lib/mod_tile/.osmosis/state.txt
by setting the actual path of state.txt (which should be $WORKOSM_DIR/state.txt). After creating state.txt, test osmosis-db_replag with *osmosis-db_replag -h
.
To prepare the script for the first execution4:
psql -d gis <<\eof
REVOKE CONNECT ON DATABASE gis FROM PUBLIC;
GRANT CONNECT ON DATABASE gis TO "www-data";
GRANT CONNECT ON DATABASE gis TO "tileserver";
eof
Initialise the osmosis replication stack to the data of your data import. Choose the date of the planet data, as this is the date from which the diffs will start.
sudo -u tileserver /usr/bin/openstreetmap-tiles-update-expire `date -u +"%Y-%m-%d"`
You will next need to update the default configuration of osmosis. In configuration.txt, change the base_url
to https://planet.openstreetmap.org/replication/minute/
(notice the usage of https). The following command will do the job:
test ! -f $WORKOSM_DIR/configuration_orig.txt -a -f $WORKOSM_DIR/configuration.txt && mv $WORKOSM_DIR/configuration.txt $WORKOSM_DIR/configuration_orig.txt && sed 's!baseUrl=http://planet.openstreetmap.org/!baseUrl=https://planet.openstreetmap.org/!' $WORKOSM_DIR/configuration_orig.txt > $WORKOSM_DIR/configuration.txt
Update your tileserver by up to an hour and expire the corresponding rendered tiles:
sudo -u tileserver /usr/bin/openstreetmap-tiles-update-expire
If your tile server is behind more than an hour you will need to call the openstreetmap-tiles-expire script multiple times.
If you want to continuously keep your server up to data, you need to add the openstreetmap-tiles-expire script to your crontab.
From time to time it may make sense to replace the preprocessed shapefiles with new versions:
cd ~/src
cd openstreetmap-carto
scripts/get-shapefiles.py
The previously mentioned trim_osc.py Python script can be added to install-postgis-osm-user.sh in order to trim the input to a bounding box or a polygon so that the postgreSQL database doesn’t grow significantly as updates are applied to it. The openstreetmap-tiles-update-expire script by junichim is a modification to the original one by including trim_osc.py from Zverik’s “regional” scripts.
In his blog, SomeoneElse reports the modifications and includes the appropriated cron scheduling. TRIM_REGION_OPTIONS shall be updated to reflect the actual region boundaries.
Geofabrik provides an updated version of the planet dataset from the latest OpenStreetMap data, already splitted into a number of pre-defined regions.
When using Geofabrik to download the initial dataset to the local database, data can be kept updated by applying diff update files (differences between the new extract and the previous one) that Geofabrik also computes each time a new extract is produced for a region. Using diff files of the same region of the initial download, users can continuously update their own regional extract instead of having to download the full file. Applying updates from Geofabrik rather than the whole planet minimizes the size of the database and the amount of data fetched from the remote server.
By default, the baseUrl parameter in configuration.txt points to the whole planet. If you are using an extract downloaded from Geofabrik, you should change the url basing on the metadata of the related PBF file; osmium is a tool which among other things allows the analysis of a PBF file; it can be installed from package through:
sudo apt-get -y install osmium-tool
You can alternatively install the latest version from sources:
sudo apt-get remove -y osmium-tool
sudo apt-get install -y cmake pandoc cppcheck iwyu clang-tidy
cd ~/src
git clone https://github.com/mapbox/protozero
git clone https://github.com/osmcode/libosmium
git clone https://github.com/osmcode/osmium-tool
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
sudo make install
sudo ln /usr/local/bin/osmium /usr/bin
Once installed, the command to analyze a PBF file is osmium fileinfo
:
PBF_FILE=liechtenstein-latest.osm.pbf
osmium fileinfo -e $PBF_FILE
The “osmosis_replication_” properties returned by this command can be used to configure osmosis. Values have to be transformed and this can be automatically done through a simple parser that generates a working configuration.txt5:
PBF_FILE=liechtenstein-latest.osm.pbf
REPLICATION_BASE_URL="$(osmium fileinfo -g 'header.option.osmosis_replication_base_url' "${PBF_FILE}")"
echo -e "baseUrl=${REPLICATION_BASE_URL}\nmaxInterval=3600" > "${WORKOSM_DIR}/configuration.txt"
So, the above command can be used to produce configuration.txt from a PBF file downloaded by Geofabrik.de.
Subsequently, the command to generate state.txt is the following5:
REPLICATION_SEQUENCE_NUMBER="$( printf "%09d" "$(osmium fileinfo -g 'header.option.osmosis_replication_sequence_number' "${PBF_FILE}")" | sed ':a;s@\B[0-9]\{3\}\>@/&@;ta' )"
curl -s -L -o "${WORKOSM_DIR}/state.txt" "${REPLICATION_BASE_URL}/${REPLICATION_SEQUENCE_NUMBER}.state.txt"
The osmosis/osm2pgsql pipeline to keep the DB updated with Geofabrik is the following:
WORKOSM_DIR=/home/tileserver/osmosisworkingdir
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres_007%
cd ~/src/openstreetmap-carto
HOSTNAME=localhost # set it to the actual ip address or host name
osmosis --read-replication-interval workingDirectory=${WORKOSM_DIR} --simplify-change --write-xml-change - | \
osm2pgsql --append -r xml -s -C 300 -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis -H $PGHOST -U $PGUSER -
Osmosis and Osm2pgsql are old tools. Even if Osmosis can perform all steps including db import, to get a totally compatible data model with the one needed by openstreetmap-carto and to ensure that lua processing is correctly executed, the Osm2pgsql chain is recommended (with the options mentioned before).
Other tools are available, capable to gain better performance, additional integration and perform more advanced processing, transformation and filtering.
Anyway, we need to verify that they do not introduce downsides and that the result is exactly the same of the osmosis/osm2pgsql toolchain and this is beyond the scope of this tutorial.
For instance the logic used by imposm might be different from the one used by osm2pgsql. If using that tool, you have to make it perform exactly the same as the standard toolchain.6
For most use cases, periodic loading of diff updates is not really needed. Especially for very small areas or when monthly updates are enough, it is worthwhile to perform a full db re-import each time, rather than implementing the osmosis/osm2pgsql toolchain. Full imports of small areas are generally fast enough and take less disk space; you simply need to download the new version of the extract, drop the database and re-import everything, getting rid of the data needed for the update by not using the -s
or --slim
option. This will also avoid DB maintenance procedures.
When needing diff updates, it is more efficient to update less often and a periodicity of one day (or better one week) is suggested.
Jochen Topf in his blog reports that Osmium can provide expedicious and exact extracts.
A note within the openstreetmap-carto repository mentions websites providing direct extracts as well as other tools that can be possibly used.
Performance tuning: