diff --git a/tools/vectorosm-tilecreator/setup/README.md b/tools/vectorosm-tilecreator/setup/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a64f6ef05c47a6e207dec6090ed64569db1d164f --- /dev/null +++ b/tools/vectorosm-tilecreator/setup/README.md @@ -0,0 +1,121 @@ +# Server-side setup notes + +Note: so far this is the prototype setup for a proposed dynamic generation of high-z tiles, not what is +actually deployed as of mid 2020. + +## Overview + +The outside interface for this is the standard [Slippy map](https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames), +under `/earth/vectorosm/v1/`. Zoom levels 1, 3, 5, 7, 9, 11, 13, 15 and 17 are offered. The individual tiles are served +as [o5m](https://wiki.openstreetmap.org/wiki/O5m) encoded files. + +Internally this is split into two parts: +* Staticly generated low-resolution tiles for zoom levels 1, 3, 5, 7 and 9, based on the [Natural Earth](https://www.naturalearthdata.com/) +data set. Those tiles exist in o5m format on disk in the exact layout they are served by the webserver. +* Dynamically generated high-resolution tiles for zoom levels 11, 13, 15 and 17. Creation and expiry of those is managed +by Tirex, and they are stored in the [metatile](https://wiki.openstreetmap.org/wiki/Tirex/Internals#Metatile_file_structure) +format (8x8 tiles in a single binary file in a 5 layer hashed folder structure). mod_tile takes care of translating that +to the outside interface. +* Input data for the dynamic generation: This is provided via an [OSMX](https://github.com/protomaps/OSMExpress) +database, which allows for fast spatial queries and efficient incremental updates. + +## Dependencies + +The following components are assumed to be on the server: +* Apache2 +* Python 3 +* For tile generation in general + * osmctools - https://gitlab.com/osm-c-tools/osmctools +* For the static/low-z tile generation (could be done on a different machine if needed): + * ogr2ogr from gdal (?) + * ne_tilegenerator.py + * marble-vectorosm-tilecreator +* For the dynamic/high-z tile generation: + * mod_tile - https://wiki.openstreetmap.org/wiki/Mod_tile + * Tirex - https://wiki.openstreetmap.org/wiki/Tirex + * osmx and osmx-update - https://github.com/protomaps/OSMExpress (static binary of osmx available there, osmx-update is a Python script) + * marble-vectorosm-tirex-backend + +## Setup + +See configuration files in the etc/ subdir. + +### Static low-z tile generation + +run ne_tilegenerator.py from ../natural-earth-vector-tiling. + +``` +mkdir -p /k/osm/htdocs/earth/vectorosm/v1/ +mkdir -p /k/osm/cache/natural_earth +./ne_tilegenerator.py -z 1,3,5,7,9 -f `pwd`/level_info.txt -o /k/osm/htdocs/earth/vectorosm/v1/ -i /k/osm/cache/natural_earth/ -c /k/osm/cache/natural_earth/ -r 30 -ow +``` + +TODO: this still generates files in its source dir, so probably this is better run inside the cache directory instead? + +The source data updates infrequently, so a low-frequency cron job is an option. + +### Dynamic high-z tile generation + +Preparing the land polygon input data by running: +`marble-vectorosm-process-land-polygons -c /k/osm/cache` + +Preparing the OSMX database: + +* Download the latest full planet data dump (in PBF format!) from a mirror listed here: https://wiki.openstreetmap.org/wiki/Planet.osm +* Run `osmx expand planet.osm.pbf /k/osm/cache/planet.osmx` to create the OSMX database. +* The downloaded data dump can be discarded afterwards to free some disk space. + +Initial pre-generation of level 11 tiles: + +``` +# North America +tirex-batch -f not-exists map=vectorosm x=310-680 y=660-940 z=11 +# South America +tirex-batch -f not-exists map=vectorosm x=560-824 y=1024-1400 z=11 +# North Africa, Asia, Europe +tirex-batch -f not-exists map=vectorosm x=920-2047 y=432-1000 z=11 +# South Africa +tirex-batch -f not-exists map=vectorosm x=1072-1312 y=1000-1232 z=11 +# Australia +tirex-batch -f not-exists map=vectorosm x=1560-2032 y=1000-1320 z=11 +``` + +This enqueues batch jobs for generating all level 11 tiles that don't exist yet. Due to the existance filter this could be re-run +after every server restart for example without causing extra generation cost. + +## Incremental Updates + +Run the following command as a daily cron job (for server locations outside for central Europe pick a different mirror): + +`osmx-update /planet.osmx https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/replication/day/` + +## Resource Requirements + +For the static low-z tiles: +* 1.2GB disk space, 265k files, 700 directories, 260k inodes for the generated data +* Generation takes about 60-90min (single core), needs about 2GB of temporary disk space, a few 100MB download volume, and ~6GB RAM peak + +For the dynamic high-z tiles (estimates and bounds, exact prediction is not possible here): +* Low-to medium density metatiles (batches of 64 tiles) generate in 100ms or less. +* High-density metatiles take ~15s - this is addressed by pre-generating the level 11 tiles initially. +* Amount of parallel processes used for generation can be adjusted in the Tirex config, each process only uses a single core. +* RAM peak should remain well below 1GB per generation process, exact amount varies with the level of detail of the processed tile. +* Disk space requirement for the generator output varies with access patterns: + * Access stats from mid 2020 show 44k distinct tiles being used in a 2w period. + * Metatiles of high-density areas are up to 1.5M in size, 10x less for lower-density areas. + * Simply multiplying this results in 66GB and 44k files, however that assumes only distinct high-z tiles are requested. + * The full world OSM data in o5m format is around 60GB as well, so that is a sensible upper bound for volume. + * The theoretical upper bound for z17 files is 2^(2*17 - 6) = 268M, however even the + [OSM access statistics](https://wiki.openstreetmap.org/wiki/Tile_disk_usage) only show about 2.5% of z17 tiles actually being loaded. + It can further be assumed that tile access is not random but clustered, which further reduces the amount of metatiles need. + * 10k to 1M files would therefore seem like the best guess for this. + +For input data updates: +* Initial download of a full OSM dataset is about 60GB (available on several fast mirrors). +* Initial creation of the OSMX database takes 6h, needs 8GB RAM and generates 700GB on disk in a single file. +* Incremental updates: 100MB download and about 20s CPU time per day, and 6GB RAM peak during that. +* Land polygons: + * 600MB download + * 600MB disk space, 16k inodes + * and an addtional 1.5GB temporary disk use during generation + * generation takes 2-3 minutes and 4.5GB RAM diff --git a/tools/vectorosm-tilecreator/setup/etc/apache2/mod_tile.conf b/tools/vectorosm-tilecreator/setup/etc/apache2/mod_tile.conf new file mode 100644 index 0000000000000000000000000000000000000000..47852169bbb9d512a4cc84d48aca262e5a59c910 --- /dev/null +++ b/tools/vectorosm-tilecreator/setup/etc/apache2/mod_tile.conf @@ -0,0 +1,145 @@ +# Based on https://github.com/openstreetmap/mod_tile/blob/master/mod_tile.conf + +# Specify the default base storage path for where tiles live. A number of different storage backends +# are available, that can be used for storing tiles. Currently these are a file based storage, a memcached +# based storage and a RADOS based storage. +# The file based storage uses a simple file path as its storage path ( /path/to/tiledir ) +# The RADOS based storage takes a location to the rados config file and a pool name ( rados://poolname/path/to/ceph.conf ) +# The memcached based storage currently has no configuration options and always connects to memcached on localhost ( memcached:// ) +# +# The storage path can be overwritten on a style by style basis from the style TileConfigFile +ModTileTileDir /k/osm/tirex/tiles + +# You can either manually configure each tile set with the default png extension and mimetype +#AddTileConfig /folder/ TileSetName + +# or manually configure each tile set, specifying the file extension +#AddTileMimeConfig /folder/ TileSetName js + +# or load all the tile sets defined in the configuration file into this virtual host. +# Some tile set specific configuration parameters can only be specified via the configuration file option +LoadTileConfigFile /etc/tirex/mod_tile.conf + +# Specify if mod_tile should keep tile delivery stats, which can be accessed from the URL /mod_tile +# The default is On. As keeping stats needs to take a lock, this might have some performance impact, +# but for nearly all intents and purposes this should be negligable ans so it is safe to keep this turned on. +ModTileEnableStats On + +# Turns on bulk mode. In bulk mode, mod_tile does not request any dirty tiles to be rerendered. Missing tiles +# are always requested in the lowest priority. The default is Off. +ModTileBulkMode Off + +# Timeout before giving up for a tile to be rendered +ModTileRequestTimeout 3 + +# Timeout before giving up for a tile to be rendered that is otherwise missing +ModTileMissingRequestTimeout 10 + +# If tile is out of date, don't re-render it if past this load threshold (users gets old tile) +ModTileMaxLoadOld 16 + +# If tile is missing, don't render it if past this load threshold (user gets 404 error) +ModTileMaxLoadMissing 50 + +# Sets how old an expired tile has to be to be considered very old and therefore get elevated priority in rendering +ModTileVeryOldThreshold 31536000000000 + +# Unix domain socket where we connect to the rendering daemon +#ModTileRenderdSocketName /var/run/renderd/renderd.sock +ModTileRenderdSocketName /var/lib/tirex/modtile.sock + +# Alternatively you can use a TCP socket to connect to renderd. The first part +# is the location of the renderd server and the second is the port to connect to. +# ModTileRenderdSocketAddr renderd.mydomain.com 7653 + +## +## Options controlling the cache proxy expiry headers. All values are in seconds. +## +## Caching is both important to reduce the load and bandwidth of the server, as +## well as reduce the load time for the user. The site loads fastest if tiles can be +## taken from the users browser cache and no round trip through the internet is needed. +## With minutely or hourly updates, however there is a trade-off between cacheability +## and freshness. As one can't predict the future, these are only heuristics, that +## need tuning. +## If there is a known update schedule such as only using weekly planet dumps to update the db, +## this can also be taken into account through the constant PLANET_INTERVAL in render_config.h +## but requires a recompile of mod_tile + +## The values in this sample configuration are not the same as the defaults +## that apply if the config settings are left out. The defaults are more conservative +## and disable most of the heuristics. + + +## +## Caching is always a trade-off between being up to date and reducing server load or +## client side latency and bandwidth requirements. Under some conditions, like poor +## network conditions it might be more important to have good caching rather than the latest tiles. +## Therefor the following config options allow to set a special hostheader for which the caching +## behaviour is different to the normal heuristics +## +## The CacheExtended parameters overwrite all other caching parameters (including CacheDurationMax) +## for tiles being requested via the hostname CacheExtendedHostname +#ModTileCacheExtendedHostname cache.tile.openstreetmap.org +#ModTileCacheExtendedDuration 2592000 + +# Upper bound on the length a tile will be set cacheable, which takes +# precedence over other settings of cacheing +ModTileCacheDurationMax 604800 + +# Sets the time tiles can be cached for that are known to by outdated and have been +# sent to renderd to be rerendered. This should be set to a value corresponding +# roughly to how long it will take renderd to get through its queue. There is an additional +# fuzz factor on top of this to not have all tiles expire at the same time +ModTileCacheDurationDirty 900 + +# Specify the minimum time mod_tile will set the cache expiry to for fresh tiles. There +# is an additional fuzz factor of between 0 and 3 hours on top of this. +ModTileCacheDurationMinimum 10800 + +# Lower zoom levels are less likely to change noticeable, so these could be cached for longer +# without users noticing much. +# The heuristic offers three levels of zoom, Low, Medium and High, for which different minimum +# cacheing times can be specified. + +#Specify the zoom level below which Medium starts and the time in seconds for which they can be cached +ModTileCacheDurationMediumZoom 13 86400 + +#Specify the zoom level below which Low starts and the time in seconds for which they can be cached +ModTileCacheDurationLowZoom 9 518400 + +# A further heuristic to determine cacheing times is when was the last time a tile has changed. +# If it hasn't changed for a while, it is less likely to change in the immediate future, so the +# tiles can be cached for longer. +# For example, if the factor is 0.20 and the tile hasn't changed in the last 5 days, it can be cached +# for up to one day without having to re-validate. +ModTileCacheLastModifiedFactor 0.20 + +## Tile Throttling +## Tile scrapers can often download large numbers of tiles and overly straining tileserver resources +## mod_tile therefore offers the ability to automatically throttle requests from ip addresses that have +## requested a lot of tiles. +## The mechanism uses a token bucket approach to shape traffic. I.e. there is an initial pool of n tiles +## per ip that can be requested arbitrarily fast. After that this pool gets filled up at a constant rate +## The algorithm has two metrics. One based on overall tiles served to an ip address and a second one based on +## the number of requests to renderd / tirex to render a new tile. + +## Overall enable or disable tile throttling +ModTileEnableTileThrottling Off +# Specify if you want to use the connecting IP for throtteling, or use the X-Forwarded-For header to determin the +# IP address to be used for tile throttling. This can be useful if you have a reverse proxy / http accellerator +# in front of your tile server. +# 0 - don't use X-Forward-For and allways use the IP that apache sees +# 1 - use the client IP address, i.e. the first entry in the X-Forwarded-For list. This works through a cascade of proxies. +# However, as the X-Forwarded-For is written by the client this is open to manipulation and can be used to circumvent the throttling +# 2 - use the last specified IP in the X-Forwarded-For list. If you know all requests come through a reverse proxy +# that adds an X-Forwarded-For header, you can trust this IP to be the IP the reverse proxy saw for the request +ModTileEnableTileThrottlingXForward 0 +## Parameters (poolsize in tiles and topup rate in tiles per second) for throttling tile serving. +ModTileThrottlingTiles 10000 1 +## Parameters (poolsize in tiles and topup rate in tiles per second) for throttling render requests. +ModTileThrottlingRenders 128 0.2 + +### +### +# increase the log level for more detailed information + LogLevel debug diff --git a/tools/vectorosm-tilecreator/setup/etc/apache2/mod_tile_integration.conf b/tools/vectorosm-tilecreator/setup/etc/apache2/mod_tile_integration.conf new file mode 100644 index 0000000000000000000000000000000000000000..12826d3c7df9f9be781734c28f790960b61ca3af --- /dev/null +++ b/tools/vectorosm-tilecreator/setup/etc/apache2/mod_tile_integration.conf @@ -0,0 +1,17 @@ +# Integrate staticly generated low-z tiles and dynamically generated high-z tiles +# provided by Tirex. For this low-z requests are forwarded 1:1 to the file system, +# high-z requests end up with Tirex via a rewrite rule. + + + Require all granted + + +Alias "/earth" "/data2/k/osm/htdocs/earth" + +# mod_tile assumes file extensions are consisting only of lower-case letters, while we want them to be "o5m" +# we achieve that by rewriting the file extension to one we use internally for this ("ofm") +# However, we must only do that for levels 11 to 17 which are delivered via Tirex, levels 1 to 9 are static +# and delivered differently. + +RewriteEngine on +RewriteRule ^(.*/\d\d/\d+/\d+)\.o5m$ /tirex$1.ofm [PT] diff --git a/tools/vectorosm-tilecreator/setup/etc/tirex/mod_tile.conf b/tools/vectorosm-tilecreator/setup/etc/tirex/mod_tile.conf new file mode 100644 index 0000000000000000000000000000000000000000..25912bd7b5ace1e7e9af4924bf963c84fa175b85 --- /dev/null +++ b/tools/vectorosm-tilecreator/setup/etc/tirex/mod_tile.conf @@ -0,0 +1,17 @@ +[vectorosm] +URI=/tirex/earth/vectorosm/v1 +TILEDIR=/k/osm/tirex/tiles/ +;HOST=tile.openstreetmap.org +TILESIZE=256 +;HTCPHOST=proxy.openstreetmap.org +;** config options used by mod_tile, but not renderd ** +MINZOOM=11 +MAXZOOM=17 +TYPE=ofm application/octet-stream +;DESCRIPTION=This is a description of the tile layer used in the tile json request +;ATTRIBUTION=©OpenStreetMap and contributors, ODbL +;SERVER_ALIAS=http://localhost/ +;CORS=http://www.openstreetmap.org +;ASPECTX=1 +;ASPECTY=1 +;SCALE=1.0 diff --git a/tools/vectorosm-tilecreator/setup/etc/tirex/renderer/marble.conf b/tools/vectorosm-tilecreator/setup/etc/tirex/renderer/marble.conf new file mode 100644 index 0000000000000000000000000000000000000000..b51f4e9adfe435e5d64061702761fee2198380a7 --- /dev/null +++ b/tools/vectorosm-tilecreator/setup/etc/tirex/renderer/marble.conf @@ -0,0 +1,26 @@ +#----------------------------------------------------------------------------- +# +# Configuration for the Marble vector tile generator +# +# /etc/tirex/renderer/marble.conf +# +#----------------------------------------------------------------------------- + +# symbolic name +name=marble + +# path to executable of renderer +path=/k/osm/generator/bin/marble-vectorosm-tirex-backend + +# UDP port where the master can contact this renderer +# must be individual for each renderer +port=9331 + +# number of processes that should be started +procs=4 + +# syslog facility +#syslog_facility=daemon + +# activate this to see debug messages from renderer +#debug=1 diff --git a/tools/vectorosm-tilecreator/setup/etc/tirex/renderer/marble/marble.conf b/tools/vectorosm-tilecreator/setup/etc/tirex/renderer/marble/marble.conf new file mode 100644 index 0000000000000000000000000000000000000000..7326b0babdf10fd745be2692884f0a3911cdb857 --- /dev/null +++ b/tools/vectorosm-tilecreator/setup/etc/tirex/renderer/marble/marble.conf @@ -0,0 +1,22 @@ +#----------------------------------------------------------------------------- +# +# Configuration for Marble vector tile generator. +# +# /etc/tirex/renderer/marble/marble.conf +# +#----------------------------------------------------------------------------- + +# symbolic name of this map +name=vectorosm + +# tile directory +tiledir=/k/osm/tirex/tiles/vectorosm/ + +# cache directory with the input data for the generator +cache-directory=/k/osm/cache + +# minimum zoom level allowed +minz=11 + +# maximum zoom level allowed +maxz=17