Cross-Platform Data Integration with JSON-LD+Body
Step-by-step: Build a Local Knowledge Graph from Wikirate JSON-LD and Enrich it with Wikidata
With the introduction of JSON-LD support in the Wikirate API, Wikirate data can now be treated as linked data. This unlocks the potential of combining Wikirate’s structured data with external knowledge graphs such as Wikidata by using standard semantic web tools.
In this guide, we walk through a practical example showing how to consume Wikirate data as JSON-LD, load it into a local RDF graph, and enrich it with additional company information from Wikidata such as industry classifications, founding dates, and descriptions. Using Python and RDFLib, we demonstrate how JSON-LD makes cross-platform data integration possible.
The result is a local knowledge graph that connects Wikirate and Wikidata through shared identifiers.
What this example demonstrates
- Fetch Wikirate API data as JSON-LD (linked data).
- Parse it into an RDF graph using RDFLib.
- Merge instance data with the published Wikirate ontology.
- Run local SPARQL queries.
- Use Wikidata to enrich companies (industry, founding date, description) and store the results locally.
Prerequisites
Install dependencies:
pip install requests rdflib
If you run this in production, store your API key in an environment variable (not inline in code).
Step 1: Define configuration for enrichment fields
We define a small EnrichmentField configuration. It specifies:
which Wikidata property to fetch and where to store it in our local RDF graph.
from dataclasses import dataclass
from typing import Optional
from rdflib import URIRef
from rdflib.namespace import XSD
@dataclass(frozen=True)
class EnrichmentField:
# Variable name in SPARQL results, e.g. "industries"
var: str
# Wikidata direct property id like "P571" or a prefixed property like "schema:description"
wd_prop: Optional[str] = None
# If True, treat the value as an entity and fetch its English label
as_label: bool = False
# Predicate used when writing results into the local graph
target_predicate: Optional[URIRef] = None
# Optional literal handling
lang: Optional[str] = None
datatype: Optional[URIRef] = None # e.g. XSD.dateTime
Tip: Keeping enrichment "field-driven" makes it easy to add new Wikidata properties later without rewriting logic.
Step 2: Fetch JSON-LD from the Wikirate API and parse it into RDF
The core move is: request JSON-LD and let RDFLib parse it directly into a graph.
import requests
from typing import Optional
from rdflib import Graph
def load_jsonld_from_api(url: str, graph: Optional[Graph] = None) -> Graph:
if graph is None:
graph = Graph()
headers = {
"Accept": "application/ld+json",
"X-API-Key": "YOUR_API_KEY",
}
response = requests.get(url, headers=headers)
response.raise_for_status()
graph.parse(data=response.text, format="json-ld")
return graph
This is where JSON-LD pays off: no custom parsing, just load the document and you have RDF triples.
Step 3: Load the Wikirate ontology and merge it with data
The ontology provides the semantic backbone (classes and properties). We add its triples to a working graph, then add API data.
from rdflib import Graph, Namespace
from rdflib.namespace import RDF, OWL
g_onto = Graph()
g_onto.parse("https://wikirate.org/ontology/wikirate.ttl", format="turtle")
print(f"Triples in ontology: {len(g_onto)}")
W = Namespace("https://wikirate.org/ontology/")
SCHEMA = Namespace("https://schema.org/")
# Optional sanity-check: list classes
for s in g_onto.subjects(RDF.type, OWL.Class):
print("Class:", s)
# Merge ontology + data
g = Graph()
g += g_onto
g = load_jsonld_from_api("https://wikirate.org/Companies?format=jsonld", g)
print("Total triples (ontology + data):", len(g))
Step 4: Inspect Companies in Python
Once loaded, the data behaves like a local graph. You can iterate through instances of wikirate:Company.
from rdflib.namespace import RDF
for company in g.subjects(RDF.type, W.Company):
name = g.value(company, SCHEMA.name)
country = g.value(company, SCHEMA.addressCountry)
print(company, "→", name, "|", country)
This is useful for quick exploration and debugging before writing SPARQL queries.
Step 5: Extract companies that have a Wikidata ID
We will enrich only the companies that already have wikirate:wikidataId.
First, pull those QIDs from the local graph with SPARQL.
QUERY = """
PREFIX wikirate: <https://wikirate.org/ontology/>
PREFIX schema: <https://schema.org/>
SELECT ?company ?name ?wikidata_id
WHERE {
?company a wikirate:Company ;
schema:name ?name ;
wikirate:wikidataId ?wikidata_id .
}
"""
qids = []
for row in g.query(QUERY):
print(row.company, "→", row.name, "→", row.wikidata_id)
qids.append(row.wikidata_id.toPython())
Step 6: Build a Wikidata SPARQL query for enrichment
We generate one Wikidata query that asks for multiple fields for multiple QIDs at once. This avoids per-company calls and keeps the enrichment efficient.
6.1 Define which fields you want from Wikidata
fields = [
EnrichmentField(
var="desc",
wd_prop="schema:description",
target_predicate=SCHEMA.description,
lang="en",
),
EnrichmentField(
var="inception",
wd_prop="P571",
target_predicate=SCHEMA.foundingDate,
datatype=XSD.dateTime,
),
EnrichmentField(
var="industries",
wd_prop="P452",
as_label=True,
target_predicate=SCHEMA.industry,
),
]
6.2 Generate the SPARQL query string
def build_wikidata_query(qids: list[str], fields: list[EnrichmentField]) -> str:
select_vars = ["?item", "?qid"]
optional_blocks: list[str] = []
label_entities: list[tuple[str, str]] = []
for f in fields:
if f.as_label:
entity_var = f"?{f.var}Entity"
label_var = f"?{f.var}Label"
select_vars.append(f"(GROUP_CONCAT(DISTINCT {label_var}; separator=\\"; \\") AS ?{f.var})")
optional_blocks.append(f"""
OPTIONAL {{
?item wdt:{f.wd_prop} {entity_var} .
}}
""")
label_entities.append((entity_var, label_var))
else:
select_vars.append(f"?{f.var}")
filter_line = f'FILTER(LANG(?{f.var}) = "{f.lang}")' if f.lang else ""
prop = f"wdt:{f.wd_prop}" if f.wd_prop.startswith("P") else f.wd_prop
optional_blocks.append(f"""
OPTIONAL {{
?item {prop} ?{f.var} .
{filter_line}
}}
""")
label_service = ""
if label_entities:
label_lines = "\\n".join([f" {ev} rdfs:label {lv} ." for ev, lv in label_entities])
label_filters = "\\n".join([f" FILTER(LANG({lv}) = \\"en\\")" for _, lv in label_entities])
label_service = f"\\n{label_lines}\\n{label_filters}\\n"
group_by = "GROUP BY ?item ?qid " + " ".join(
f"?{f.var}" for f in fields if not f.as_label
)
return f\"\"\"
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
SELECT {" ".join(select_vars)}
WHERE {{
VALUES ?item {{ {" ".join(f"wd:{qid}" for qid in qids)} }}
BIND(REPLACE(STR(?item), "^.*/", "") AS ?qid)
{"".join(optional_blocks)}
{"" if not label_service else f"OPTIONAL {{ { label_service } }}"}
}}
{group_by}
\"\"\"
This "field-driven" builder keeps the example extensible.
Step 7: Execute Wikidata query and shape results
def exec_wikidata_query(query: str) -> dict:
endpoint = "https://query.wikidata.org/sparql"
headers = {
"Accept": "application/sparql-results+json",
"User-Agent": "wikirate-jsonld-example/1.0 (contact: [email protected])",
}
r = requests.get(endpoint, params={"query": query}, headers=headers, timeout=60)
r.raise_for_status()
return r.json()
def get_enrichment(data: dict, fields: list[EnrichmentField]) -> dict[str, dict[str, str]]:
field_names = [f.var for f in fields]
out: dict[str, dict[str, str]] = {}
for b in data["results"]["bindings"]:
qid = b["qid"]["value"]
out.setdefault(qid, {})
for k in field_names:
if k in b:
out[qid][k] = b[k]["value"]
return out
Step 8: Write enrichment back into the local graph
We add a schema:sameAs link to the Wikidata entity and store the enrichment as new triples.
from rdflib import URIRef, Literal
from rdflib.namespace import RDF
def apply_enrichment(g: Graph,
enrichment: dict[str, dict[str, str]],
W: Namespace,
SCHEMA: Namespace,
fields: list[EnrichmentField]) -> Graph:
WD_ENTITY = "http://www.wikidata.org/entity/"
for company in g.subjects(RDF.type, W.Company):
qid_lit = g.value(company, W.wikidataId)
if not qid_lit:
continue
qid = str(qid_lit)
facts = enrichment.get(qid)
if not facts:
continue
# Add schema:sameAs ONLY if it is not already present
if (company, SCHEMA.sameAs, wd_iri) not in g:
g.add((company, SCHEMA.sameAs, wd_iri))
for f in fields:
if not f.target_predicate or f.var not in facts:
continue
val = facts[f.var]
if not val:
continue
if f.lang:
lit = Literal(val, lang=f.lang)
elif f.datatype:
lit = Literal(val, datatype=f.datatype)
else:
lit = Literal(val)
g.add((company, f.target_predicate, lit))
return g
Step 9: Query the enriched graph
QUERY = """
PREFIX wikirate: <https://wikirate.org/ontology/>
PREFIX schema: <https://schema.org/>
SELECT ?company ?name ?country ?wikidata ?desc ?industry ?foundingDate
WHERE {
?company a wikirate:Company ;
schema:name ?name ;
wikirate:wikidataId ?wikidata .
OPTIONAL { ?company schema:addressCountry ?country }
OPTIONAL { ?company schema:description ?desc }
OPTIONAL { ?company schema:industry ?industry }
OPTIONAL { ?company schema:foundingDate ?foundingDate }
}
LIMIT 20
"""
for row in g.query(QUERY):
print(row.company, "→", row.name, "→", row.country, "→", row.wikidata, "→", row.desc, "→", row.industry, "→", row.foundingDate)
Optional: Save the enriched graph
# Turtle
g.serialize("wikirate_enriched.ttl", format="turtle")
# JSON-LD (compaction/framing depends on context settings)
g.serialize("wikirate_enriched.jsonld", format="json-ld")
Production notes
- Pagination:
/Companies?format=jsonldreturns one page; iteratehydra:nextto fetch more. - Batch size: Enrich in batches (e.g., 50–200 QIDs per Wikidata query) to stay friendly to public endpoints.
- Semantics: Using
schema:industryetc. is pragmatic; you can also store Wikidata IRIs directly if you prefer.