Archived Repository - Do not use this repository anymore!

SANSA got easier to use! All its code has been consolidated into a single repository at


Maven Central Build Status Coverage Status License Twitter

SANSA-ML is the Machine Learning (ML) library in the SANSA stack (see Algorithms in this repository perform various machine learning tasks directly on RDF/OWL input data. While most machine learning algorithms are based on processing simple features, the machine learning algorithms in SANSA-ML exploit the graph structure and semantics of the background knowledge specified using the RDF and OWL standards. In many cases, this allows to obtain either more accurate or more human-understandable results. In contrast to most other algorithms supporting background knowledge, they scale horizontally using Apache Spark and Apache Flink.

The ML layer currently supports the following algorithms:

  • RDF graph clustering (Power Iteration, Border Flow, Link based clustering, Modularity based clustering, Silvia Link Clustering)
  • Rule mining in RDF graphs based on AMIE+
  • Semantic similarity measures (Jaccard similarity,Rodríguez and Egenhofer similarity, Tversky Ratio Model, Batet Similarity)
  • Knowledge graph embedding approaches:
    • TransE (beta status)
    • DistMult (beta status)
  • Terminological Decision Trees for the classification of concepts(beta status)
  • Anomaly detection (beta status)
  • RDF graph kernel based on A Fast and Simple Graph Kernel for RDF

Please see for examples on how to use the above machine learning approaches.

Several further algorithms are in development. Please create a pull request and/or contact Jens Lehmann if you are interested in contributing algorithms to SANSA-ML.

How to Contribute

We always welcome new contributors to the project! Please see our contribution guide for more details on how to get started contributing to SANSA.