Knowledge Graph Embedding

Graph Embeddings

Ontology alignment involves finding correspondences between entities in different ontologies. OntoAligner addresses this challenge by leveraging Knowledge Graph Embedding (KGE) models. The core idea of KGE is to represent entities (like classes, properties, individuals) and relations within an ontology as low-dimensional vectors in a continuous vector space. These numerical representations (embeddings) are learned to preserve semantic relationships from the original ontology geometrically in the embedding space.

Hint

Why KGE for Alignment?

  1. Semantic Preservation: KGE models aim to capture the meaning and relationships of entities in their vector representations.

  2. Scalability: Working with numerical vectors can be more efficient for large-scale comparison than symbolic matching.

  3. Similarity Measurement: Once entities are embedded, their semantic similarity can be easily measured (e.g., using cosine similarity).

OntoAligner’s KGE-based alignment process involves several key components that work in sequence. These components are described in the following figure within GraphEmbeddingsAligner.

Note

Reference: Giglou, Hamed Babaei, Jennifer D’Souza, Sören Auer, and Mahsa Sanaei. “OntoAligner Meets Knowledge Graph Embedding Aligners.” arXiv preprint arXiv:2509.26417 (2025).

Usage

This module guides you through a step-by-step process for performing ontology alignment using a KGEs and the OntoAligner library. By the end, you’ll understand how to preprocess data, encode ontologies, generate alignments, evaluate results, and save the outputs in XML and JSON formats.

The first step is to prepare the ontology data for the KGE model. The Parser transforms raw ontology information into a structured format suitable for KGE models.

from ontoaligner.ontology import GraphTripleOMDataset

task  = GraphTripleOMDataset(ontology_name = "Mouse-Human")
print("task:", task)
# >>> task: Track: GraphTriple, Source-Target sets: Mouse-Human

dataset = task.collect(
    source_ontology_path="assets/mouse-human/source.xml",
    target_ontology_path="assets/mouse-human/target.xml",
    reference_matching_path="assets/mouse-human/reference.xml"
)
print("dataset key-values:", dataset.keys())
# >>> dataset key-values: dict_keys(['dataset-info', 'source', 'target', 'reference'])

print("Sample source ontology:", dataset['source'][0])

This will result in the sample source ontology with following metadata:

[
    {
        'subject': ('http://mouse.owl#MA_0000143', 'tonsil'),
        'predicate': ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'type'),
        'object': ('http://www.w3.org/2002/07/owl#Class', 'Class'),
        'subject_is_class': True,
        'object_is_class': False
    },
    ...
]

Once the soruce and target ontologies are parsed, the GraphTripleEncoder creates a triplet representations. The triplet representation is in [(Subject Label, Predicate Label, Object Label), ... ] format, which is standard input for KGE models.

from ontoaligner.encoder import GraphTripleEncoder

encoder = GraphTripleEncoder()
encoded_dataset = encoder(**dataset)

After triplets are generated, they are fed into the KGE model. This is the core engine that learns low-dimensional embeddings for all entities and relations present in the triplets. Here lets use CovEAligner, it is a specific implementation of the KGE-based aligner (specifically ConvE) within the OntoAligner library. It encapsulates the entire process from data ingestion and embedding learning to alignment prediction.

from ontoaligner.aligner import ConvEAligner

kge_params = {
    'device': 'cpu',                  # str: Device to use for training ('cpu' or 'cuda')
    'embedding_dim': 300,             # int: Dimensionality of learned embeddings
    'num_epochs': 50,                 # int: Number of training epochs
    'train_batch_size': 128,          # int: Number of positive triplets per training batch
    'eval_batch_size': 64,            # int: Number of triplets per evaluation batch
    'num_negs_per_pos': 5,            # int: Number of negative samples per positive triplet
    'random_seed': 42,                # int: Seed for reproducibility
}

aligner = ConvEAligner(**kge_params)

matchings = aligner.generate(input_data=encoded_dataset)

Note

The .generate function will do the training and then matching.

This step focuses on post-processing predicted matchings, potentially utilizing a similarity score for filtering and applying cardinality based processing, and subsequently evaluating their quality against a reference dataset to assess performance before and after post-processing.

from ontoaligner.postprocess import graph_postprocessor

processed_matchings = graph_postprocessor(predicts=matchings, threshold=0.5)

The following code will compare the generated alignments with reference matchings. Then save the matchings in both XML and JSON formats for further analysis or use. Feel free to use any of the techniques.

from ontoaligner.utils import metrics

evaluation = metrics.evaluation_report(predicts=matchings, references=dataset['reference'])
print("Matching Evaluation Report:\n", evaluation)

evaluation = metrics.evaluation_report(predicts=processed_matchings, references=dataset['reference'])
print("Matching Evaluation Report -- after post-processing:\n", evaluation)
from ontoaligner.utils import metrics

xml_str = xmlify.xml_alignment_generator(matchings=processed_matchings)
with open("matchings.xml", "w", encoding="utf-8") as xml_file:
    xml_file.write(xml_str)
with open("matchings.json", "w", encoding="utf-8") as json_file:
    json.dump(processed_matchings, json_file, indent=4, ensure_ascii=False)

KGE Aligners

The ontoaligner.aligner.graph module provides a suite of graph embedding-based aligners built on top of popular KGE models. These aligners leverage link prediction objectives and low-dimensional vector spaces to learn semantic representations of entities, facilitating accurate ontology alignment even across heterogeneous structures. Each aligner wraps a specific KGE model implemented through the PyKEEN framework, allowing plug-and-play integration and consistent similarity scoring across models. Some models include custom similarity functions to better capture semantic distance in complex embedding spaces (e.g., complex numbers or quaternions).

The following table lists the available KGE aligners:

Aligner Name

Description

Link

ConvEAligner

Based on ConvE, which uses 2D convolutions over reshaped entity and relation embeddings to model complex interactions.

Source

TransDAligner

Based on TransD, which constructs relation-specific projection matrices dynamically from both entity and relation vectors.

Source

TransEAligner

Based on TransE, a translation-based model that learns embeddings where \(h + r \approx t\).

Source

TransFAligner

Based on TransF, which enables flexible translations for complex relations without increasing model complexity.

Source

TransHAligner

Based on TransH, which projects entities onto relation-specific hyperplanes before translation.

Source

TransRAligner

Based on TransR, which embeds entities and relations in separate spaces using relation-specific projections.

Source

DistMultAligner

Based on DistMult, a bilinear model that uses diagonal matrices for efficient relational modeling.

Source

ComplExAligner

Based on ComplEx, which uses complex-valued embeddings to model symmetric and antisymmetric relations; includes a custom similarity function using real parts of complex dot products.

Source

HolEAligner

Based on HolE, which combines compositional and holographic representations using circular correlation.

Source

RotatEAligner

Based on RotatE, which models relations as rotations in complex space and supports rich relational patterns; includes a similarity override.

Source

SimplEAligner

Based on SimplE, which learns dependent embeddings for each entity and supports fully expressive factorization.

Source

CrossEAligner

Based on CrossE, which learns both general and triple-specific embeddings to capture bidirectional interactions.

Source

BoxEAligner

Based on BoxE, which models relations as boxes in vector space to support hierarchies and logical rules.

Source

CompGCNAligner

Based on CompGCN, a graph convolutional network designed for multi-relational graphs using composition operations.

Source

MuREAligner

Based on MuRE, which embeds entities in hyperbolic space to better model hierarchies and relation-specific transformations.

Source

QuatEAligner

Based on QuatE, which uses quaternion embeddings and custom similarity logic to model expressive 4D rotations and relational structure.

Source

SEAligner

Based on SE, a neural model that embeds symbolic knowledge into vector space using learned neural transformations.

Source

To use KGE aligner based technique:

from ontoaligner.aligner import TransEAligner

aligner = TransEAligner()

matchings = aligner.generate(input_data=...)

If the desired model is not avaliable in OntoAligner, then:

from ontoaligner.aligner.graph import GraphEmbeddingAligner

class CustomKGEAligner(GraphEmbeddingAligner):
    model = "RESCAL"

aligner = CustomKGEAligner()
matchings = aligner.generate(input_data=...)

Or, you can also directly use the base GraphEmbeddingAligner and specify the model you want to use in a simple way:

from ontoaligner.aligner.graph import GraphEmbeddingAligner

aligner = GraphEmbeddingAligner(model="RESCAL")
matchings = aligner.generate(input_data=...)

Here RESCAL is our custom KGE model.

Note

For possible models please take a look at PyKEEN > Models.

KGE Retriever

In addition to one-to-one alignments, OntoAligner also supports retriever-based alignment. When retriever mode is enabled (retriever=True), the aligner returns the top-k candidate target entities for each source entity, along with their similarity scores (similar to retriever aligner). This model is useful if you want to build downstream candidate filtering pipelines, apply human-in-the-loop validation, or integrate with reranking modules (e.g., LLMs or supervised classifiers).

Here is the example on how to use KGE Aligner as a retriever model:

from ontoaligner.aligner import TransEAligner

# Enable retriever mode and request top-3 candidates per source entity
aligner = TransEAligner(retriever=True, top_k=3)

matchings = aligner.generate(input_data=encoded_dataset)

Mode

Description

KGE Default mode

In KGE aligners, the default mode is retriever=False, where it produces one-to-one alignments, where each source entity is matched to the single most similar target entity.

KGE Retriever mode

In KGE aligners, the default mode is retriever=True, where it produces one-to-many alignments, where each source entity is matched to multiple target entities. Example output:

[
   {
     "source": "http://mouse.owl#MA_0000143",
     "target-cands": [
         "http://human.owl#HBA_0000214",
         "http://human.owl#HBA_0000762",
         "http://human.owl#HBA_0000891"
     ],
     "score-cands": [0.87, 0.82, 0.77]
   },
   ...
]
{
    'source': 'http://mouse.owl#MA_0000143',
    'target': 'http://human.owl#HBA_0000214',
    'score': 0.87
}

Note

Consider reading the following section next: