data-tools / big-data-types   1.3.3

Apache License 2.0 Website GitHub

A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.

Scala versions: 3.x 2.13 2.12

Big Data Types

CI Tests BQ IT Maven Central codecov Scala Steward badge

A type-safe library to transform Case Classes into Database schemas and to convert implemented types into other types

Documentation

Check the Documentation website to learn more about how to use this library

Available conversions:

From / To Scala Types BigQuery Spark Cassandra Circe (JSON)
Scala -
BigQuery -
Spark -
Cassandra -
Circe (JSON)

Versions for Scala Scala 2.12 ,Scala_2.13 and Scala 3.x are available in Maven

Quick Start

The library has different modules that can be imported separately

  • BigQuery
libraryDependencies += "io.github.data-tools" %% "big-data-types-bigquery" % "{version}"
  • Spark
libraryDependencies += "io.github.data-tools" %% "big-data-types-spark" % "{version}"
  • Cassandra
libraryDependencies += "io.github.data-tools" %% "big-data-types-cassandra" % "{version}"
  • Circe (JSON)
libraryDependencies += "io.github.data-tools" %% "big-data-types-circe" % "{version}"
  • Core
    • To get support for abstract SqlTypes, it is included in the others, so it is not needed if you are using one of the others
libraryDependencies += "io.github.data-tools" %% "big-data-types-core" % "{version}"

In order to transform one type into another, both modules have to be imported.

How it works

The library internally uses a generic ADT (SqlType) that can store any schema representation, and from there, it can be converted into any other. Transformations are done through 2 different type-classes.

Quick examples

Case Classes to other types

//Spark
val s: StructType = SparkSchemas.schema[MyCaseClass]
//BigQuery
val bq: List[Field] = SqlTypeToBigQuery[MyCaseClass].bigQueryFields // just the schema
BigQueryTable.createTable[MyCaseClass]("myDataset", "myTable") // Create a table in a BigQuery real environment
//Cassandra
val c: CreateTable = CassandraTables.table[MyCaseClass]

There are also extension methods that make easier the transformation between types when there are instances

//from Case Class instance
val foo: MyCaseClass = ???
foo.asBigQuery // List[Field]
foo.asSparkSchema // StructType
foo.asCassandra("TableName", "primaryKey") // CreateTable

Conversion between types works in the same way

// From Spark to others
val foo: StructType = myDataFrame.schema
foo.asBigQuery // List[Field]
foo.asCassandra("TableName", "primaryKey") // CreateTable

//From BigQuery to others
val foo: Schema = ???
foo.asSparkFields // List[StructField]
foo.asSparkSchema // StructType
foo.asCassandra("TableName", "primaryKey") // CreateTable

//From Cassandra to others
val foo: CreateTable = ???
foo.asSparkFields // List[StructField]
foo.asSparkSchema // StructType
foo.asBigQuery // List[Field]
foo.asBigQuery.schema // Schema