Commit 5b9471e0 authored by Volker Krause's avatar Volker Krause Committed by Torsten Rahn

Add Apache and Tirex config files

Also, start to document the setup.
parent ae1c539f
# Server-side setup notes
Note: so far this is the prototype setup for a proposed dynamic generation of high-z tiles, not what is
actually deployed as of mid 2020.
## Overview
The outside interface for this is the standard [Slippy map](https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames),
under `/earth/vectorosm/v1/`. Zoom levels 1, 3, 5, 7, 9, 11, 13, 15 and 17 are offered. The individual tiles are served
as [o5m](https://wiki.openstreetmap.org/wiki/O5m) encoded files.
Internally this is split into two parts:
* Staticly generated low-resolution tiles for zoom levels 1, 3, 5, 7 and 9, based on the [Natural Earth](https://www.naturalearthdata.com/)
data set. Those tiles exist in o5m format on disk in the exact layout they are served by the webserver.
* Dynamically generated high-resolution tiles for zoom levels 11, 13, 15 and 17. Creation and expiry of those is managed
by Tirex, and they are stored in the [metatile](https://wiki.openstreetmap.org/wiki/Tirex/Internals#Metatile_file_structure)
format (8x8 tiles in a single binary file in a 5 layer hashed folder structure). mod_tile takes care of translating that
to the outside interface.
* Input data for the dynamic generation: This is provided via an [OSMX](https://github.com/protomaps/OSMExpress)
database, which allows for fast spatial queries and efficient incremental updates.
## Dependencies
The following components are assumed to be on the server:
* Apache2
* Python 3
* For tile generation in general
* osmctools - https://gitlab.com/osm-c-tools/osmctools
* For the static/low-z tile generation (could be done on a different machine if needed):
* ogr2ogr from gdal (?)
* ne_tilegenerator.py
* marble-vectorosm-tilecreator
* For the dynamic/high-z tile generation:
* mod_tile - https://wiki.openstreetmap.org/wiki/Mod_tile
* Tirex - https://wiki.openstreetmap.org/wiki/Tirex
* osmx and osmx-update - https://github.com/protomaps/OSMExpress (static binary of osmx available there, osmx-update is a Python script)
* marble-vectorosm-tirex-backend
## Setup
See configuration files in the etc/ subdir.
### Static low-z tile generation
run ne_tilegenerator.py from ../natural-earth-vector-tiling.
```
mkdir -p /k/osm/htdocs/earth/vectorosm/v1/
mkdir -p /k/osm/cache/natural_earth
./ne_tilegenerator.py -z 1,3,5,7,9 -f `pwd`/level_info.txt -o /k/osm/htdocs/earth/vectorosm/v1/ -i /k/osm/cache/natural_earth/ -c /k/osm/cache/natural_earth/ -r 30 -ow
```
TODO: this still generates files in its source dir, so probably this is better run inside the cache directory instead?
The source data updates infrequently, so a low-frequency cron job is an option.
### Dynamic high-z tile generation
Preparing the land polygon input data by running:
`marble-vectorosm-process-land-polygons -c /k/osm/cache`
Preparing the OSMX database:
* Download the latest full planet data dump (in PBF format!) from a mirror listed here: https://wiki.openstreetmap.org/wiki/Planet.osm
* Run `osmx expand planet.osm.pbf /k/osm/cache/planet.osmx` to create the OSMX database.
* The downloaded data dump can be discarded afterwards to free some disk space.
Initial pre-generation of level 11 tiles:
```
# North America
tirex-batch -f not-exists map=vectorosm x=310-680 y=660-940 z=11
# South America
tirex-batch -f not-exists map=vectorosm x=560-824 y=1024-1400 z=11
# North Africa, Asia, Europe
tirex-batch -f not-exists map=vectorosm x=920-2047 y=432-1000 z=11
# South Africa
tirex-batch -f not-exists map=vectorosm x=1072-1312 y=1000-1232 z=11
# Australia
tirex-batch -f not-exists map=vectorosm x=1560-2032 y=1000-1320 z=11
```
This enqueues batch jobs for generating all level 11 tiles that don't exist yet. Due to the existance filter this could be re-run
after every server restart for example without causing extra generation cost.
## Incremental Updates
Run the following command as a daily cron job (for server locations outside for central Europe pick a different mirror):
`osmx-update <path-to>/planet.osmx https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/replication/day/`
## Resource Requirements
For the static low-z tiles:
* 1.2GB disk space, 265k files, 700 directories, 260k inodes for the generated data
* Generation takes about 60-90min (single core), needs about 2GB of temporary disk space, a few 100MB download volume, and ~6GB RAM peak
For the dynamic high-z tiles (estimates and bounds, exact prediction is not possible here):
* Low-to medium density metatiles (batches of 64 tiles) generate in 100ms or less.
* High-density metatiles take ~15s - this is addressed by pre-generating the level 11 tiles initially.
* Amount of parallel processes used for generation can be adjusted in the Tirex config, each process only uses a single core.
* RAM peak should remain well below 1GB per generation process, exact amount varies with the level of detail of the processed tile.
* Disk space requirement for the generator output varies with access patterns:
* Access stats from mid 2020 show 44k distinct tiles being used in a 2w period.
* Metatiles of high-density areas are up to 1.5M in size, 10x less for lower-density areas.
* Simply multiplying this results in 66GB and 44k files, however that assumes only distinct high-z tiles are requested.
* The full world OSM data in o5m format is around 60GB as well, so that is a sensible upper bound for volume.
* The theoretical upper bound for z17 files is 2^(2*17 - 6) = 268M, however even the
[OSM access statistics](https://wiki.openstreetmap.org/wiki/Tile_disk_usage) only show about 2.5% of z17 tiles actually being loaded.
It can further be assumed that tile access is not random but clustered, which further reduces the amount of metatiles need.
* 10k to 1M files would therefore seem like the best guess for this.
For input data updates:
* Initial download of a full OSM dataset is about 60GB (available on several fast mirrors).
* Initial creation of the OSMX database takes 6h, needs 8GB RAM and generates 700GB on disk in a single file.
* Incremental updates: 100MB download and about 20s CPU time per day, and 6GB RAM peak during that.
* Land polygons:
* 600MB download
* 600MB disk space, 16k inodes
* and an addtional 1.5GB temporary disk use during generation
* generation takes 2-3 minutes and 4.5GB RAM
# Based on https://github.com/openstreetmap/mod_tile/blob/master/mod_tile.conf
# Specify the default base storage path for where tiles live. A number of different storage backends
# are available, that can be used for storing tiles. Currently these are a file based storage, a memcached
# based storage and a RADOS based storage.
# The file based storage uses a simple file path as its storage path ( /path/to/tiledir )
# The RADOS based storage takes a location to the rados config file and a pool name ( rados://poolname/path/to/ceph.conf )
# The memcached based storage currently has no configuration options and always connects to memcached on localhost ( memcached:// )
#
# The storage path can be overwritten on a style by style basis from the style TileConfigFile
ModTileTileDir /k/osm/tirex/tiles
# You can either manually configure each tile set with the default png extension and mimetype
#AddTileConfig /folder/ TileSetName
# or manually configure each tile set, specifying the file extension
#AddTileMimeConfig /folder/ TileSetName js
# or load all the tile sets defined in the configuration file into this virtual host.
# Some tile set specific configuration parameters can only be specified via the configuration file option
LoadTileConfigFile /etc/tirex/mod_tile.conf
# Specify if mod_tile should keep tile delivery stats, which can be accessed from the URL /mod_tile
# The default is On. As keeping stats needs to take a lock, this might have some performance impact,
# but for nearly all intents and purposes this should be negligable ans so it is safe to keep this turned on.
ModTileEnableStats On
# Turns on bulk mode. In bulk mode, mod_tile does not request any dirty tiles to be rerendered. Missing tiles
# are always requested in the lowest priority. The default is Off.
ModTileBulkMode Off
# Timeout before giving up for a tile to be rendered
ModTileRequestTimeout 3
# Timeout before giving up for a tile to be rendered that is otherwise missing
ModTileMissingRequestTimeout 10
# If tile is out of date, don't re-render it if past this load threshold (users gets old tile)
ModTileMaxLoadOld 16
# If tile is missing, don't render it if past this load threshold (user gets 404 error)
ModTileMaxLoadMissing 50
# Sets how old an expired tile has to be to be considered very old and therefore get elevated priority in rendering
ModTileVeryOldThreshold 31536000000000
# Unix domain socket where we connect to the rendering daemon
#ModTileRenderdSocketName /var/run/renderd/renderd.sock
ModTileRenderdSocketName /var/lib/tirex/modtile.sock
# Alternatively you can use a TCP socket to connect to renderd. The first part
# is the location of the renderd server and the second is the port to connect to.
# ModTileRenderdSocketAddr renderd.mydomain.com 7653
##
## Options controlling the cache proxy expiry headers. All values are in seconds.
##
## Caching is both important to reduce the load and bandwidth of the server, as
## well as reduce the load time for the user. The site loads fastest if tiles can be
## taken from the users browser cache and no round trip through the internet is needed.
## With minutely or hourly updates, however there is a trade-off between cacheability
## and freshness. As one can't predict the future, these are only heuristics, that
## need tuning.
## If there is a known update schedule such as only using weekly planet dumps to update the db,
## this can also be taken into account through the constant PLANET_INTERVAL in render_config.h
## but requires a recompile of mod_tile
## The values in this sample configuration are not the same as the defaults
## that apply if the config settings are left out. The defaults are more conservative
## and disable most of the heuristics.
##
## Caching is always a trade-off between being up to date and reducing server load or
## client side latency and bandwidth requirements. Under some conditions, like poor
## network conditions it might be more important to have good caching rather than the latest tiles.
## Therefor the following config options allow to set a special hostheader for which the caching
## behaviour is different to the normal heuristics
##
## The CacheExtended parameters overwrite all other caching parameters (including CacheDurationMax)
## for tiles being requested via the hostname CacheExtendedHostname
#ModTileCacheExtendedHostname cache.tile.openstreetmap.org
#ModTileCacheExtendedDuration 2592000
# Upper bound on the length a tile will be set cacheable, which takes
# precedence over other settings of cacheing
ModTileCacheDurationMax 604800
# Sets the time tiles can be cached for that are known to by outdated and have been
# sent to renderd to be rerendered. This should be set to a value corresponding
# roughly to how long it will take renderd to get through its queue. There is an additional
# fuzz factor on top of this to not have all tiles expire at the same time
ModTileCacheDurationDirty 900
# Specify the minimum time mod_tile will set the cache expiry to for fresh tiles. There
# is an additional fuzz factor of between 0 and 3 hours on top of this.
ModTileCacheDurationMinimum 10800
# Lower zoom levels are less likely to change noticeable, so these could be cached for longer
# without users noticing much.
# The heuristic offers three levels of zoom, Low, Medium and High, for which different minimum
# cacheing times can be specified.
#Specify the zoom level below which Medium starts and the time in seconds for which they can be cached
ModTileCacheDurationMediumZoom 13 86400
#Specify the zoom level below which Low starts and the time in seconds for which they can be cached
ModTileCacheDurationLowZoom 9 518400
# A further heuristic to determine cacheing times is when was the last time a tile has changed.
# If it hasn't changed for a while, it is less likely to change in the immediate future, so the
# tiles can be cached for longer.
# For example, if the factor is 0.20 and the tile hasn't changed in the last 5 days, it can be cached
# for up to one day without having to re-validate.
ModTileCacheLastModifiedFactor 0.20
## Tile Throttling
## Tile scrapers can often download large numbers of tiles and overly straining tileserver resources
## mod_tile therefore offers the ability to automatically throttle requests from ip addresses that have
## requested a lot of tiles.
## The mechanism uses a token bucket approach to shape traffic. I.e. there is an initial pool of n tiles
## per ip that can be requested arbitrarily fast. After that this pool gets filled up at a constant rate
## The algorithm has two metrics. One based on overall tiles served to an ip address and a second one based on
## the number of requests to renderd / tirex to render a new tile.
## Overall enable or disable tile throttling
ModTileEnableTileThrottling Off
# Specify if you want to use the connecting IP for throtteling, or use the X-Forwarded-For header to determin the
# IP address to be used for tile throttling. This can be useful if you have a reverse proxy / http accellerator
# in front of your tile server.
# 0 - don't use X-Forward-For and allways use the IP that apache sees
# 1 - use the client IP address, i.e. the first entry in the X-Forwarded-For list. This works through a cascade of proxies.
# However, as the X-Forwarded-For is written by the client this is open to manipulation and can be used to circumvent the throttling
# 2 - use the last specified IP in the X-Forwarded-For list. If you know all requests come through a reverse proxy
# that adds an X-Forwarded-For header, you can trust this IP to be the IP the reverse proxy saw for the request
ModTileEnableTileThrottlingXForward 0
## Parameters (poolsize in tiles and topup rate in tiles per second) for throttling tile serving.
ModTileThrottlingTiles 10000 1
## Parameters (poolsize in tiles and topup rate in tiles per second) for throttling render requests.
ModTileThrottlingRenders 128 0.2
###
###
# increase the log level for more detailed information
LogLevel debug
# Integrate staticly generated low-z tiles and dynamically generated high-z tiles
# provided by Tirex. For this low-z requests are forwarded 1:1 to the file system,
# high-z requests end up with Tirex via a rewrite rule.
<Directory "/data2/k/osm/htdocs/earth">
Require all granted
</Directory>
Alias "/earth" "/data2/k/osm/htdocs/earth"
# mod_tile assumes file extensions are consisting only of lower-case letters, while we want them to be "o5m"
# we achieve that by rewriting the file extension to one we use internally for this ("ofm")
# However, we must only do that for levels 11 to 17 which are delivered via Tirex, levels 1 to 9 are static
# and delivered differently.
RewriteEngine on
RewriteRule ^(.*/\d\d/\d+/\d+)\.o5m$ /tirex$1.ofm [PT]
[vectorosm]
URI=/tirex/earth/vectorosm/v1
TILEDIR=/k/osm/tirex/tiles/
;HOST=tile.openstreetmap.org
TILESIZE=256
;HTCPHOST=proxy.openstreetmap.org
;** config options used by mod_tile, but not renderd **
MINZOOM=11
MAXZOOM=17
TYPE=ofm application/octet-stream
;DESCRIPTION=This is a description of the tile layer used in the tile json request
;ATTRIBUTION=&copy;<a href=\"http://www.openstreetmap.org/\">OpenStreetMap</a> and <a href=\"http://wiki.openstreetmap.org/wiki/Contributors\">contributors</a>, <a href=\"http://opendatacommons.org/licenses/odbl/\">ODbL</a>
;SERVER_ALIAS=http://localhost/
;CORS=http://www.openstreetmap.org
;ASPECTX=1
;ASPECTY=1
;SCALE=1.0
#-----------------------------------------------------------------------------
#
# Configuration for the Marble vector tile generator
#
# /etc/tirex/renderer/marble.conf
#
#-----------------------------------------------------------------------------
# symbolic name
name=marble
# path to executable of renderer
path=/k/osm/generator/bin/marble-vectorosm-tirex-backend
# UDP port where the master can contact this renderer
# must be individual for each renderer
port=9331
# number of processes that should be started
procs=4
# syslog facility
#syslog_facility=daemon
# activate this to see debug messages from renderer
#debug=1
#-----------------------------------------------------------------------------
#
# Configuration for Marble vector tile generator.
#
# /etc/tirex/renderer/marble/marble.conf
#
#-----------------------------------------------------------------------------
# symbolic name of this map
name=vectorosm
# tile directory
tiledir=/k/osm/tirex/tiles/vectorosm/
# cache directory with the input data for the generator
cache-directory=/k/osm/cache
# minimum zoom level allowed
minz=11
# maximum zoom level allowed
maxz=17
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment