Knowledge Graph Embedding¶

Graph Embeddings¶

Ontology alignment involves finding correspondences between entities in different ontologies. OntoAligner addresses this challenge by leveraging Knowledge Graph Embedding (KGE) models. The core idea of KGE is to represent entities (like classes, properties, individuals) and relations within an ontology as low-dimensional vectors in a continuous vector space. These numerical representations (embeddings) are learned to preserve semantic relationships from the original ontology geometrically in the embedding space.

Hint

Why KGE for Alignment?

Semantic Preservation: KGE models aim to capture the meaning and relationships of entities in their vector representations.
Scalability: Working with numerical vectors can be more efficient for large-scale comparison than symbolic matching.
Similarity Measurement: Once entities are embedded, their semantic similarity can be easily measured (e.g., using cosine similarity).

OntoAligner’s KGE-based alignment process involves several key components that work in sequence. These components are described in the following figure within GraphEmbeddingsAligner.

Note

Reference: Giglou, Hamed Babaei, Jennifer D’Souza, Sören Auer, and Mahsa Sanaei. “OntoAligner Meets Knowledge Graph Embedding Aligners.” arXiv preprint arXiv:2509.26417 (2025).

Usage¶

This module guides you through a step-by-step process for performing ontology alignment using a KGEs and the OntoAligner library. By the end, you’ll understand how to preprocess data, encode ontologies, generate alignments, evaluate results, and save the outputs in XML and JSON formats.

➡️ 1: Parser

The first step is to prepare the ontology data for the KGE model. The Parser transforms raw ontology information into a structured format suitable for KGE models.

from ontoaligner.ontology import GraphTripleOMDataset

task  = GraphTripleOMDataset(ontology_name = "Mouse-Human")
print("task:", task)
# >>> task: Track: GraphTriple, Source-Target sets: Mouse-Human

dataset = task.collect(
    source_ontology_path="assets/mouse-human/source.xml",
    target_ontology_path="assets/mouse-human/target.xml",
    reference_matching_path="assets/mouse-human/reference.xml"
)
print("dataset key-values:", dataset.keys())
# >>> dataset key-values: dict_keys(['dataset-info', 'source', 'target', 'reference'])

print("Sample source ontology:", dataset['source'][0])

This will result in the sample source ontology with following metadata:

[
    {
        'subject': ('http://mouse.owl#MA_0000143', 'tonsil'),
        'predicate': ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'type'),
        'object': ('http://www.w3.org/2002/07/owl#Class', 'Class'),
        'subject_is_class': True,
        'object_is_class': False
    },
    ...
]

➡️ 2: Encoder

Once the soruce and target ontologies are parsed, the GraphTripleEncoder creates a triplet representations. The triplet representation is in [(Subject Label, Predicate Label, Object Label), ... ] format, which is standard input for KGE models.

from ontoaligner.encoder import GraphTripleEncoder

encoder = GraphTripleEncoder()
encoded_dataset = encoder(**dataset)

➡️ 3: Aligner

After triplets are generated, they are fed into the KGE model. This is the core engine that learns low-dimensional embeddings for all entities and relations present in the triplets. Here lets use CovEAligner, it is a specific implementation of the KGE-based aligner (specifically ConvE) within the OntoAligner library. It encapsulates the entire process from data ingestion and embedding learning to alignment prediction.

from ontoaligner.aligner import ConvEAligner

kge_params = {
    'device': 'cpu',                  # str: Device to use for training ('cpu' or 'cuda')
    'embedding_dim': 300,             # int: Dimensionality of learned embeddings
    'num_epochs': 50,                 # int: Number of training epochs
    'train_batch_size': 128,          # int: Number of positive triplets per training batch
    'eval_batch_size': 64,            # int: Number of triplets per evaluation batch
    'num_negs_per_pos': 5,            # int: Number of negative samples per positive triplet
    'random_seed': 42,                # int: Seed for reproducibility
}

aligner = ConvEAligner(**kge_params)

matchings = aligner.generate(input_data=encoded_dataset)

Note

The .generate function will do the training and then matching.

➡️ 4: Post-Process

This step focuses on post-processing predicted matchings, potentially utilizing a similarity score for filtering and applying cardinality based processing, and subsequently evaluating their quality against a reference dataset to assess performance before and after post-processing.

from ontoaligner.postprocess import graph_postprocessor

processed_matchings = graph_postprocessor(predicts=matchings, threshold=0.5)

➡️ 5: Evaluate and Export

The following code will compare the generated alignments with reference matchings. Then save the matchings in both XML and JSON formats for further analysis or use. Feel free to use any of the techniques.

from ontoaligner.utils import metrics

evaluation = metrics.evaluation_report(predicts=matchings, references=dataset['reference'])
print("Matching Evaluation Report:\n", evaluation)

evaluation = metrics.evaluation_report(predicts=processed_matchings, references=dataset['reference'])
print("Matching Evaluation Report -- after post-processing:\n", evaluation)

📄 <> Export matchings to XML

from ontoaligner.utils import metrics

xml_str = xmlify.xml_alignment_generator(matchings=processed_matchings)
with open("matchings.xml", "w", encoding="utf-8") as xml_file:
    xml_file.write(xml_str)

# 🧾 {} Export matchings to JSON

with open("matchings.json", "w", encoding="utf-8") as json_file:
    json.dump(processed_matchings, json_file, indent=4, ensure_ascii=False)

KGE Aligners¶

The ontoaligner.aligner.graph module provides a suite of graph embedding-based aligners built on top of popular KGE models. These aligners leverage link prediction objectives and low-dimensional vector spaces to learn semantic representations of entities, facilitating accurate ontology alignment even across heterogeneous structures. Each aligner wraps a specific KGE model implemented through the PyKEEN framework, allowing plug-and-play integration and consistent similarity scoring across models. Some models include custom similarity functions to better capture semantic distance in complex embedding spaces (e.g., complex numbers or quaternions).

The following table lists the available KGE aligners:

Aligner Name	Description	Link
`ConvEAligner`	Based on ConvE, which uses 2D convolutions over reshaped entity and relation embeddings to model complex interactions.	Source
`TransDAligner`	Based on TransD, which constructs relation-specific projection matrices dynamically from both entity and relation vectors.	Source
`TransEAligner`	Based on TransE, a translation-based model that learns embeddings where \(h + r \approx t\).	Source
`TransFAligner`	Based on TransF, which enables flexible translations for complex relations without increasing model complexity.	Source
`TransHAligner`	Based on TransH, which projects entities onto relation-specific hyperplanes before translation.	Source
`TransRAligner`	Based on TransR, which embeds entities and relations in separate spaces using relation-specific projections.	Source
`DistMultAligner`	Based on DistMult, a bilinear model that uses diagonal matrices for efficient relational modeling.	Source
`ComplExAligner`	Based on ComplEx, which uses complex-valued embeddings to model symmetric and antisymmetric relations; includes a custom similarity function using real parts of complex dot products.	Source
`HolEAligner`	Based on HolE, which combines compositional and holographic representations using circular correlation.	Source
`RotatEAligner`	Based on RotatE, which models relations as rotations in complex space and supports rich relational patterns; includes a similarity override.	Source
`SimplEAligner`	Based on SimplE, which learns dependent embeddings for each entity and supports fully expressive factorization.	Source
`CrossEAligner`	Based on CrossE, which learns both general and triple-specific embeddings to capture bidirectional interactions.	Source
`BoxEAligner`	Based on BoxE, which models relations as boxes in vector space to support hierarchies and logical rules.	Source
`CompGCNAligner`	Based on CompGCN, a graph convolutional network designed for multi-relational graphs using composition operations.	Source
`MuREAligner`	Based on MuRE, which embeds entities in hyperbolic space to better model hierarchies and relation-specific transformations.	Source
`QuatEAligner`	Based on QuatE, which uses quaternion embeddings and custom similarity logic to model expressive 4D rotations and relational structure.	Source
`SEAligner`	Based on SE, a neural model that embeds symbolic knowledge into vector space using learned neural transformations.	Source

To use KGE aligner based technique:

from ontoaligner.aligner import TransEAligner

aligner = TransEAligner()

matchings = aligner.generate(input_data=...)

If the desired model is not avaliable in OntoAligner, then:

from ontoaligner.aligner.graph import GraphEmbeddingAligner

class CustomKGEAligner(GraphEmbeddingAligner):
    model = "RESCAL"

aligner = CustomKGEAligner()
matchings = aligner.generate(input_data=...)

Or, you can also directly use the base GraphEmbeddingAligner and specify the model you want to use in a simple way:

from ontoaligner.aligner.graph import GraphEmbeddingAligner

aligner = GraphEmbeddingAligner(model="RESCAL")
matchings = aligner.generate(input_data=...)

Here RESCAL is our custom KGE model.

Note

For possible models please take a look at PyKEEN > Models.

KGE Retriever¶

In addition to one-to-one alignments, OntoAligner also supports retriever-based alignment. When retriever mode is enabled (retriever=True), the aligner returns the top-k candidate target entities for each source entity, along with their similarity scores (similar to retriever aligner). This model is useful if you want to build downstream candidate filtering pipelines, apply human-in-the-loop validation, or integrate with reranking modules (e.g., LLMs or supervised classifiers).

Here is the example on how to use KGE Aligner as a retriever model:

from ontoaligner.aligner import TransEAligner

# Enable retriever mode and request top-3 candidates per source entity
aligner = TransEAligner(retriever=True, top_k=3)

matchings = aligner.generate(input_data=encoded_dataset)

Mode	Description
KGE Default mode	In KGE aligners, the default mode is `retriever=False`, where it produces one-to-one alignments, where each source entity is matched to the single most similar target entity.
KGE Retriever mode	In KGE aligners, the default mode is `retriever=True`, where it produces one-to-many alignments, where each source entity is matched to multiple target entities. Example output:

➡️ KGE Retriever Mode Example output

[
   {
     "source": "http://mouse.owl#MA_0000143",
     "target-cands": [
         "http://human.owl#HBA_0000214",
         "http://human.owl#HBA_0000762",
         "http://human.owl#HBA_0000891"
     ],
     "score-cands": [0.87, 0.82, 0.77]
   },
   ...
]

➡️ KGE Default Mode Example output

{
    'source': 'http://mouse.owl#MA_0000143',
    'target': 'http://human.owl#HBA_0000214',
    'score': 0.87
}

Note

Consider reading the following section next:

Package Reference > Aligners