addressextract
parses an OpenStreetmap PBF file and exports all addresses e.g. objects with addr:*
tags on them. It also reassembles admin and postcode boundarys and extends the addressinformation
with those.
A typical address looks like this:
{
"city": "Paderborn",
"geomcity": "Paderborn",
"geomcounty": "Kreis Paderborn",
"geompostcode": "33098",
"geomsuburb": "Paderborn",
"housenumber": "2a",
"housename": "foobar",
"id": "25906061",
"lat": "51.717383",
"lon": "8.752651",
"postcode": "33098",
"source": "node",
"street": "Marienplatz"
}
All the geom*
tags originate from the boundaries, be it postcode or admin boundary.
id
and source
tell the origin osm object be it node or way and its id.
Additionally you could run the extractor with "-e" so it will add an error field describing problems with this address - It could look like this:
"errors": [
"No housenumber",
"No postcode",
"No addr:street or addr:place"
]
addressextract -i mylocal.pbf >addresses.json
To filter or extract addresses from the resulting json there is either the tool addrfilter
:
addrfilter -p 33330 -i owl.json -c
To export all address with postcode 33330 and dump it as CSV. An alternative is to use jq
:
jq -r '.addresses[] | [ .id, .source, .postcode, .city, .street, .housenumber ] | @csv' owl.json
apt-get install build-essential cmake-data cmake libboost-all-dev \
libspatialindex-dev libgdal-dev libbz2-dev libexpat1-dev
git clone https://github.com/flohoff/addressextract.git
cd addressextract
git clone https://github.com/nlohmann/json
git clone https://github.com/mapbox/protozero
git clone https://github.com/osmcode/libosmium
cmake .
make
Using Debian/Buster to setup a solr:
sudo apt-get install solr-jetty jq
sudo cp -rav searchasyoutype/solr36conf/* /etc/solr/
sudo service jetty9 restart
addressextract -i <mylittlepbf | jq .addresses >addresses.json
searchasyoutype/pushtosolr http://localhost:8080/solr addresses.json
Now you can query the solr for results:
curl http://localhost:8080/solr/address/?q=postcode:3333%20street:Heidestra%C3%9Fe
Typical query time is less than 50ms and the result looks like this:
{
"responseHeader": {
"status": 0,
"QTime": 13,
"params": {
"q": "postcode:33330 street:Heidestraße"
}
},
"response": {
"numFound": 26262,
"start": 0,
"docs": [
{
"city": "Gütersloh",
"geomcity": "Gütersloh",
"geompostcode": "33330",
"housenumber": "1c",
"id": "7380458133",
"postcode": "33330",
"street": "Heidestraße"
},
[ ... ]
- hausnummern mit spaces oder "," z.b. "4a,b" oder "5a,5b" 33602,Bielefeld,LOOM, Bahnhofstraße,28,52.025702,8.531427,node,5202495577,33602,Bielefeld,,