-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substitute OSM keys which are concepts with a proper URI #50
Comments
That's a great suggestion, thanks! Can you be more specific concerning "This has the advantage that these tags automatically come with all translations and pictures, or pictograms.". How does one obtain the translations, pictures, or pictograms of tags? |
Simply because the Wikidata pendants to all keys are better maintained. E.g. https://www.wikidata.org/wiki/Q207694 for (Tag:tourism=gallery). (And the amount of languages the Tag-Pendant has, might be a good heuristic to choose to which to point.) |
Just to make sure I understand this part correctly: We store each property as
This makes
What change do you exactly propose? Maybe an example could help me understand this better. I'm not against optionally adding wikidata pendants to the data. I'm currently against replacing the OSM representation as the initial goal of osm2rdf is to provide as much of the OSM data as possible without the need of additional knowledgebases. Substituting the values would make the dataset unusable without the corresponding wikidata information. |
Yes sorry, an example speaks a thousand words, instead (or additionally) as of today:
It goes a bit in the same story as mentioned in #49, representing the OSM Datamodel in LD vs. adapting the model to the new medium, with the goal to be highly queryable in the LD world. (Definitely its also possible to provide both, but for sure you can guess in which camp I am overall.) |
@lehmann-4178656ch If I understand correctly, the idea is to make use of semantic information available in Wikidata as there are some concepts mapped to OSM keys or OSM tags. You could then make use of those mappings and "merge" it into the OSM2RDF converted dataset. For example, for each OSM entity with tag In a research project, I quite recently did that for parts of the domain of interest. For example, I extracted a subset of OSM planet for "government" buildings, then used OSM2RDF to convert it to RDF - in a next step, I wanted to align it with Wikidata concepts/entities directly - this can either be done by SELECT ?item ?itemLabel ?tag
WHERE
{
?item wdt:P1282 ?tag.
filter(contains(?tag, "government"))
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} b) from the OSM Wiki which nowadays has so called "data items" (basically the same Mediawiki backend as Wikidata) via some API:
Long story short, simply adding a triple I don't think the whole things would be useful to put into OSM2RDF, it's more something on top of it and besides the simple baseline to just fetch the ~4000 OSM tag - WD concept mappings and add an RDF triple to each converted OSM entity with one of those tags - that's nothing more than a SPARQL Update statement to run in a post-processing step. Sorry for the long post. |
Not sure if I would really go and try to find the more intricate semantics of the predicates which could be used. The simple going away from a literal e.g. ('country') to an URI (independent of if it shall be Wikidata or some internal URI), will allow to attach meaning to it. As with the example above, if for '''osm: key:place''' there is a Wikidata Entry. The tag has suddenly Multilingual labels. Which can be used to search for it, but also helps showing the "properties" of an Entry. But there are definitively many open questions to this. |
Ok, but then we can just run this SPARQL Update statement, no? PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX osmkey: <https://www.openstreetmap.org/wiki/Key:>
DELETE {
?entity ?osmkey ?val .
}
INSERT {
?entity ?osmkey ?item .
}
WHERE {
{
SELECT ?item ?val ?osmkey {
SERVICE <https://query.wikidata.org/sparql> {
?item wdt:P1282 ?tag.
}
BIND(REPLACE(?tag, "^Tag:(.*)=.*$", "$1") AS ?key)
BIND(REPLACE(?tag, "^Tag:.*=(.*)$", "$1") AS ?val)
BIND(URI(CONCAT(STR(osmkey:), ?key)) AS ?osmkey)
}
}
?entity ?osmkey ?val .
}
works for me at least. Indeed might be expensive for OSM planet if the triple store puts it into an transaction. |
I fully agree with @LorenzBuehmann on this issue, especially this part:
For me, it boils down to the fact that the OSM/WikiData concept mappings are nowhere present in the OSM data itself. Since we specifically provide a converter from OSM to RDF data, any feature that would require additional input (e.g. the external OSM/WD mappings) would be out of scope of this project. |
Hm, I should probably not have mentioned Wikidata here. The more important part is to go from strings to URIs for concepts, to be able to extend and link it. |
I like the idea of using more URIs for objects.
This also makes a lot of sense to me. I'd be very interested in exploring what such an external mapping would look like and how it could be maintained. Some previous work includes the mapping used by the LinkedGeoData project. Although most OSM keys seem to be mapped to custom defined URIs (e.g. |
Dear all - also referring to #49 I'd like to point you to the fact that the OSM wiki already has its own "OSM wikidata" instance, which contains OSM wikidata items! I'd suggest using that. Almost every tag description page on the OSM wiki has an OSM wikidata item; sometimes even for keys, like "addr:" for address. Just look for "Data Items data object" at the bottom of the toolbar of an OSM wiki page, e.g. here for a tree: https://wiki.openstreetmap.org/wiki/Tag:natural=tree . And be aware that OSM has an "open world assumption" that allows many concepts to be associated with a single OSM object. So a given "building" can have multiple tags representing multiple different views. Also, be careful about thinking that an OSM (wikidata) concept like a "tree" can be mapped 1:1 to a wikidata concept. For example, consider a "monument" (tag historic=monument), where the OSM wikidata item text (https://wiki.openstreetmap.org/wiki/Item:Q4839 ) says: "A memorial object, which is especially large (one can go inside, walk on or through it) or very tall (...), built to remember, show respect to a person or group of people or to commemorate an event.". Whereas wikidata.org says in item https://www.wikidata.org/wiki/Q4989906 ... it's an "imposing structure created to commemorate a person or event, or used for that purpose". These definitions are not identical and will rarely be. So on the one hand, the fact that OSM concepts have their own OSM wikidata item is a direct solution to having a proper URI. On the other hand, this shows that you have a classic semantic integration problem here, where you have inter-schema relationships between OSM wikidata items and wikidata.org items where two concepts are either "equal, disjoint, intersect, or include". This could be solved in fact with an external schema mapping service. |
That is correct, and also what I'm using currently. Unfortunately, the data isn't available as a dump and the Sophox SPARQL endpoint does only contain a part of those data items, see the Github issue. If anybody here is willing to use that endpoint, please keep that in mind. |
Data items from OSM Wikibase are available as a TTL dump at https://wiki.openstreetmap.org/dump/ (wikibase-rdf.ttl.gz). Sophox/sophox#31 appears to still be an issue, at least in that particular case, but I don’t know if the root cause is an incomplete dump or something else downstream. |
@1ec5 Thanks for this update. I am not sure I understand the dataset though. For example, what is the significance of a prefix like
and then what is the purpose of a triple like
I find it particularly confusing that prefix names from Wikidata are reused here (in the Wikidata dump, What do the others think? |
The OSM Wiki has data items about more than just keys, tags, and relation types. In principle, it could have an item about any page in the wiki’s main namespace. Many of these pages describe OSM concepts, software packages, or geographic regions that have local mapping communities. These pages are vastly outnumbered by tagging pages, which have titles beginning with pseudonamespaces such as “Key:”, “Tag:”, “Relation:”, and “Role:”. The corresponding data items are instances of subclasses of OpenStreetMap concepts or OpenHistoricalMap concepts. (OHM shares the OSM Wiki. The tagging pages are all subpages of “OpenHistoricalMap”, but the data items are differentiated only by their classes.)
This resolves a QID to a data item in OSM Wikibase.
Yes, unfortunately the Wikibase developers declined to allow installations to customize any of the alphabetic prefixes like Q and P. They suggest to rely on |
Rust osm2rdf is extremely fast, but sadly it does not (yet) support streaming updates. The issue was that there is currently no simple way to figure out which files (first daily, then hourly, then minutly) to download and process. Once someone writes a simple code that, given the latest timestamp of an edit, produces a sequence of filenames, I can easily adjust that code to actually produce the SPARQL INSERT statements for the updates. In the mean time, there is https://github.com/Sophox/sophox/tree/main/osm2rdf - the original python code that does the same thing but it takes a day to convert OSM dump into TTLs. |
Let me know if there are any specific questions to help with the data model |
It would be very use full for indexing and to attach further meaning to have a URI instead the OSM value for keys, e.g.
<https://www.openstreetmap.org/wiki/Key:place> "country" .
Either proper concepts in the osm2rdf namespace are created on the fly, or potentially more useful is to substitute the values of this key directly with the fitting Wikidata Concept. Wikidata does list the OSM keys as a Property. The following query can extract the mapping https://w.wiki/6MJT
(They are not always distinct, e.g. https://www.wikidata.org/wiki/Q1007870 and https://www.wikidata.org/wiki/Q207694)
This has the advantage that these tags automatically come with all translations and pictures, or pictograms.
The downside is that it is not clear how to keep ever changing targets Wikidata and OSM are, up-to-date. It might get resolved dynamically at time of conversion.
The text was updated successfully, but these errors were encountered: