modakanalytics / neo4j.almaren   0.1.2-3.2

Apache License 2.0 GitHub

Neo4j Connector for Almaren Framework

Scala versions: 2.12 2.11

Neo4j Connector

Build Status

To add neo4j dependency to your sbt build:

libraryDependencies += "com.github.music-of-the-ainur" %% "neo4j-almaren" % "0.1.4-3.4"

Neo4j Connector was implemented using https://github.com/neo4j-contrib/neo4j-spark-connector. For more details check the following link.

To run in spark-shell: For 2.12:

spark-shell --master "local[*]" --packages "com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-3.4,com.github.music-of-the-ainur:neo4j-almaren_2.12:0.1.4-3.4"

For 2.13:

spark-shell --master "local[*]" --packages "com.github.music-of-the-ainur:almaren-framework_2.13:0.9.10-3.4,com.github.music-of-the-ainur:neo4j-almaren_2.13:0.1.4-3.4"

Connector Usage

Maven / Ivy Package Usage

The connector is also available from the Maven Central repository. It can be used using the --packages option or the spark.jars.packages configuration property. Use the following value

version Connector Artifact
Spark 3.4.x and scala 2.13 com.github.music-of-the-ainur:neo4j-almaren_2.13:0.1.4-3.4
Spark 3.4.x and scala 2.12 com.github.music-of-the-ainur:neo4j-almaren_2.12:0.1.4-3.4
Spark 3.3.x and scala 2.13 com.github.music-of-the-ainur:neo4j-almaren_2.13:0.1.4-3.3
Spark 3.3.x and scala 2.12 com.github.music-of-the-ainur:neo4j-almaren_2.12:0.1.4-3.3
Spark 3.2.x and scala 2.12 com.github.music-of-the-ainur:neo4j-almaren_2.12:0.1.4-3.2
Spark 3.1.x and scala 2.12 com.github.music-of-the-ainur:neo4j-almaren_2.12:0.1.4-3.1
Spark 2.4.x and scala 2.12 com.github.music-of-the-ainur:neo4j-almaren_2.12:0.1.4-2.4
Spark 2.4.x and scala 2.11 com.github.music-of-the-ainur:neo4j-almaren_2.11:0.1.4-2.4

Source and Target

Source

Parameteres

Parameters Description
url The url of the Neo4j instance to connect to
---- ----
Options Description
------------ -------------------------
authentication.basic.username Username to use for basic authentication type
authentication.basic.password Username to use for basic authentication type
authentication.custom.credentials These are the credentials authenticating the principal
labels labels is a name or identifier to a Node or a Relationship in Neo4j Database.
Nodes Nodes are often used to represent entities. The simplest possible graph is a single node.
Relationship A relationship connects two nodes. Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.

For More Driver options check the following link

Example

import org.apache.spark.sql.{AnalysisException, Column, DataFrame, SaveMode, SparkSession}
import org.scalatest._
import org.apache.spark.sql.functions._
import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit
import com.github.music.of.the.ainur.almaren.neo4j.Neo4j.Neo4jImplicit

  val almaren = Almaren("neo4j-almaren")


  val df = almaren.builder
    .sourceNeo4j(
      "bolt://localhost:7687",
      Some("neo4j"),
      Some("neo4j1234"),
      Map("labels" -> "Person")
    ).batch

Target:

Parameters

Parameters Description
url The url of the Neo4j instance to connect to
-------- ------
Options Description
authentication.basic.username Username to use for basic authentication type
authentication.basic.password Username to use for basic authentication type
authentication.custom.credentials These are the credentials authenticating the principal
SaveMode SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.
node.keys Comma separated list of properties considered as node keys in case of you’re using SaveMode.Overwrite

For More Driver options check the following link

Example

import org.apache.spark.sql.{AnalysisException, Column, DataFrame, SaveMode, SparkSession}
import org.scalatest._
import org.apache.spark.sql.functions._
import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit
import com.github.music.of.the.ainur.almaren.neo4j.Neo4j.Neo4jImplicit

  val almaren = Almaren("neo4j-almaren")


  val df = almaren.builder
    .sourceSql("select * from person_info")
    .targetNeo4j(
      "bolt://localhost:7687",
      Some("neo4j"),
      Some("neo4j1234"),
      Map("labels" -> "Person"),
      SaveMode.ErrorIfExists
    ).batch