Parsers¶
Hint
Different aligner models may require specific types of parsers and encoders. The Aligners section clearly outlines these dependencies, detailing which parser and encoder components are used by each aligner type.
The parser module in OntoAligner provides essential oaei (track ontologies) and generic ontology parsers for handling ontologies for ontology alignment tasks. This tutorial explains the structure, key components, and how to utilize these modules in your ontology alignment workflows.
Usage¶
To be able to run an alignment task, you need to create an OMDataset for your work. The OMDataset class is responsible for managing ontology matching datasets, handling the source and target ontologies, and parsing reference alignments. This class utilizes ontology parsers for parsing ontologies and BaseAlignmentsParser for handling reference alignments, allowing users to define custom datasets by specifying track names, ontology names, and parsing methods.
class OMDataset(ABC):
track: str = ""
ontology_name: str = ""
source_ontology: Any = None
target_ontology: Any = None
alignments: Any = BaseAlignmentsParser()
def collect(self, source_ontology_path: str, target_ontology_path: str, reference_matching_path: str="") -> Dict:
....
Now, for specifying source and target ontologies, we provide two components: generic and oaei. Both modules enables loading various ontologies and formatting them into the following structure:
[{
'name': 'PhaseEquilibrium',
'iri': 'http://matonto.org/ontologies/matonto#PhaseEquilibrium',
'label': 'Phase Equilibrium',
'childrens': [],
'parents': [{'iri': 'http://ontology.dumontierlab.com/MeasuredProperty',
'name': 'MeasuredProperty',
'label': 'measured property'}],
'synonyms': [],
'comment': ['The conditions at which two phases can be at equilibrium']
}, ... ]
In the final OMDataset will form a parsed ontology alignment task using source and target ontologies in the following format:
{
"dataset-info": {
"track": "track name",
"ontology-name": "source ontology name-target ontology name"
},
"source": [
{
"name": "iri name",
"iri": "iri",
"label": "label",
"childrens": [{"iri": "", "name":"", "label":""}, ... ],
"parents": [{"iri": "", "name":"", "label":""}, ... ],
"synonyms": ["synonym1", ...],
"comment": ["comment1",... ]
}
...
],
"target": [
{
"name": "iri name",
"iri": "iri",
"label": "label",
"childrens": [{"iri": "", "name":"", "label":""}, ... ],
"parents": [{"iri": "", "name":"", "label":""}, ... ],
"synonyms": ["synonym1", ...],
"comment": ["comment1",... ]
}
...
],
"reference": [
{
"source": "source iri",
"target": "target iri",
"relation": "="
},
...
]
}
Hint
If you don’t specify the reference_matching_path in the OMDataset, it will be assumed to be an empty list [].
Generic Parser¶
An generic class for parsing OWL/rdf based ontologies. This class defines methods to extract data such as names, labels, IRIs, children, parents, synonyms, and comments for ontology classes. It provides a smooth parser for given ontology on the hand which later can be used for ontology alignment. To use this module for desired ontology you need to use the following code:
from ontoaligner.ontology import GenericOntology
ontology = GenericOntology()
parsed_ontology = ontology.parse("conference.owl")
As another example, suppose you want to perform ontology alignment for the GEO and GeoNames ontologies. In this case, you can use the GenericOMDataset as follows:
from ontoaligner.ontology import GenericOMDataset
task = GenericOMDataset(
track = "Geographical" # optional
ontology_name = "GEO-GeoNames" # optional
)
dataset = task.collect(source_ontology_path="geo.owl", target_ontology_path="geonames.owl")
OAEI Parsers¶
The OAEI tasks (not all of them) datasets are supported within the OntoAligner from the LLMs4OM: Matching Ontologies with Large Language Models empirical study.
The OntoAligner contains several Python modules that include support for the following tracks:
Anatomy: Ontology alignments in the anatomical domain.
Biodiv: Ontology alignments in the biodiversity domain.
BioML: Ontology alignments in the biomedical domain, specifically designed for machine learning approaches with train/test sets.
CommonKG: Ontology alignments in the common knowledge graph domain.
Food: Ontology alignments in the food domain.
MSE: Ontology alignments in the materials science and engineering domain.
Phenotype: Ontology alignments in the phenotype domain.
The following example demonstrates how to load the MaterialInformation-MatOnto task from the oaei track list:
from ontoaligner.ontology.oaei import MaterialInformationMatOntoOMDataset
task = MaterialInformationMatOntoOMDataset()
dataset = task.collect(
source_ontology_path="../assets/MI-MatOnto/mi_ontology.xml",
target_ontology_path="../assets/MI-MatOnto/matonto_ontology.xml",
reference_matching_path="../assets/MI-MatOnto/matchings.xml"
)
For a simpler import, use:
from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
Note
Consider reading the following section next for more details on available OAEI parsers.