-
sansa-stack/archived-sansa-inference
A general Inference API based on two of the most popular Big Data processing engines: Apache Spark and Apache Flink
Scala versions: 2.11 -
agile-lab-dev/wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Scala versions: 2.12 2.11 -
sansa-stack/archived-sansa-owl
SANSA Stack OWL (Web Ontology Language) API
Scala versions: 2.11 -
isarn/isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Scala versions: 2.12 2.11 2.10 -
whylabs/whylogs-java
Profile and monitor your ML data pipeline end-to-end
Scala versions: 2.12 2.11 -
locationtech/rasterframes
Geospatial Raster support for Spark DataFrames
Scala versions: 2.12 2.11 -
absaoss/pramen
Resilient data pipeline framework running on Apache Spark
Scala versions: 2.13 2.12 2.11 -
s22s/pre-lt-raster-frames
Spark DataFrames for earth observation data
Scala versions: 2.11 -
romans-weapon/spear-framework
Rapid ETL/ELT-connectors/pipeline development leveraged on top of Apache Spark
Scala versions: 2.12 2.11 -
arcizon/spark-filetransfer
API for reading and writing data via various file transfer protocols from Apache Spark.
Scala versions: 2.12 2.11 -
qubole/streaminglens
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Scala versions: 2.11 -
florentf9/sparkml-som
:sparkles: Spark ML implementation of SOM algorithm (Kohonen self-organizing map)
Scala versions: 2.11 -
piotr-kalanski/data-quality-monitoring
Data Quality Monitoring Tool
Scala versions: 2.11