timvw / adobe-analytics-datafeed-datasource   0.1.0

Apache License 2.0 GitHub

Apache Spark data source for Adobe Analytics Data Feed

Scala versions: 2.12

Maven Central

Datasource for Adobe Analytics Data Feed

Adobe Analytics Data feeds are a means to get raw data out of Adobe Analytics.

This project implements an Apache Spark data source leveraging uniVocity TSV Parser and does not suffer from the flaws found in many online examples which treat the (hit)data files as CSV. Concretly, escaped values are not handled correctly by a CSV parser due to inherent differences between CSV and TSV.

Features

All available options are here: DatafeedOptions.scala

Usage

Make sure the package is in the classpath, eg: by using the --packages option:

spark-shell --packages "be.icteam:adobe-analytics-datafeed-datasource_2.12:$version"

And you can read the feed as following:

val df = spark.read
  .format("be.icteam.adobe.analytics.datafeed")
  .load("./src/test/resources/randyzwitch")

Here is what it looks like:

df.show(3, false)

+------------------------------------------------------+----------------------------------+------------------+------------+---------------+------------------------+----------+------------------------+-----------+-----------+----------------+--------------------+-------------------+---------------+----------+-----+-----+-----+-----+-----+-----+-----+-----+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-----------+------------------+-------------------------------------------------------------------+--------------------------+------------------+---------+-----------+-------+----------+--------+--------------+-----------------+-----------------+----------------------+---------+-------------------+------------------+-------------+------------+------------+-------------+----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+----------+----------+----------+----------+-------------+---------------+--------------------+--------------------+--------------------+---------------------------------------------------------------------+--------------------+--------------+---------------------------------------------------------------------+----------------------+--------------------------------------------------------------------+----------+----------+-------------+-----------------+-----------------------------------------------------------+------------+----------+----------+-----------+-----------+------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+---------------+----------------------------------------------------------------------+------------------+----------+-------------------+-------------------+---------+----------+-------------+-------+-------------------------------------------------------------------------------------------------------------------------+--------------+---------+--------------+-------------------------------------+-------------------+--------------------+-------------------------------------------------------------------+--------------------+
|post_event_list                                       |post_product_list                 |browser           |browser_type|connection_type|country                 |javascript|language                |os         |resolution |ref_type        |accept_language     |date_time          |domain         |evar1     |evar2|evar3|evar4|evar5|evar6|evar7|evar8|evar9|evar10|evar11|evar12|evar13|evar14|evar15|evar16|evar17|evar18|evar19|evar20|evar21|evar22|evar23|evar24|evar25|evar26|evar27|evar28|evar29|evar30|evar31|evar32|evar33|evar34|evar35|evar36|evar37|evar38|evar39|evar40|evar41|evar42|evar43|evar44|evar45|evar46|evar47|evar48|evar49|evar50|evar51|evar52|evar53|evar54|evar55|evar56|evar57|evar58|evar59|evar60|evar61|evar62|evar63|evar64|evar65|evar66|evar67|evar68|evar69|evar70|evar71|evar72|evar73|evar74|evar75|exclude_hit|first_hit_pagename|first_hit_page_url                                                 |first_hit_referrer        |first_hit_time_gmt|geo_city |geo_country|geo_dma|geo_region|geo_zip |ip            |last_hit_time_gmt|last_purchase_num|last_purchase_time_gmt|new_visit|post_browser_height|post_browser_width|post_campaign|post_channel|post_cookies|post_currency|post_cust_hit_time_gmt|post_evar1|post_evar2|post_evar3|post_evar4|post_evar5|post_evar6|post_evar7|post_evar8|post_evar9|post_evar10|post_evar11|post_evar12|post_evar13|post_evar14|post_evar15|post_evar16|post_evar17|post_evar18|post_evar19|post_evar20|post_evar21|post_evar22|post_evar23|post_evar24|post_evar25|post_evar26|post_evar27|post_evar28|post_evar29|post_evar30|post_evar31|post_evar32|post_evar33|post_evar34|post_evar35|post_evar36|post_evar37|post_evar38|post_evar39|post_evar40|post_evar41|post_evar42|post_evar43|post_evar44|post_evar45|post_evar46|post_evar47|post_evar48|post_evar49|post_evar50|post_evar51|post_evar52|post_evar53|post_evar54|post_evar55|post_evar56|post_evar57|post_evar58|post_evar59|post_evar60|post_evar61|post_evar62|post_evar63|post_evar64|post_evar65|post_evar66|post_evar67|post_evar68|post_evar69|post_evar70|post_evar71|post_evar72|post_evar73|post_evar74|post_evar75|post_hier1|post_hier2|post_hier3|post_hier4|post_hier5|post_keywords|post_page_event|post_page_event_var1|post_page_event_var2|post_page_event_var3|post_pagename                                                        |post_pagename_no_url|post_page_type|post_page_url                                                        |post_persistent_cookie|post_prop1                                                          |post_prop2|post_prop3|post_prop4   |post_prop5       |post_prop6                                                 |post_prop7  |post_prop8|post_prop9|post_prop10|post_prop11|post_prop12       |post_prop13|post_prop14|post_prop15|post_prop16|post_prop17|post_prop18|post_prop19|post_prop20|post_prop21|post_prop22|post_prop23|post_prop24|post_prop25|post_prop26|post_prop27|post_prop28|post_prop29|post_prop30|post_prop31|post_prop32|post_prop33|post_prop34|post_prop35|post_prop36|post_prop37|post_prop38|post_prop39|post_prop40|post_prop41|post_prop42|post_prop43|post_prop44|post_prop45|post_prop46|post_prop47|post_prop48|post_prop49|post_prop50|post_prop51|post_prop52|post_prop53|post_prop54|post_prop55|post_prop56|post_prop57|post_prop58|post_prop59|post_prop60|post_prop61|post_prop62|post_prop63|post_prop64|post_prop65|post_prop66|post_prop67|post_prop68|post_prop69|post_prop70|post_prop71|post_prop72|post_prop73|post_prop74|post_prop75|post_purchaseid|post_referrer                                                         |post_search_engine|post_state|post_visid_high    |post_visid_low     |post_zip |prev_page |ref_domain   |service|user_agent                                                                                                               |visit_keywords|visit_num|visit_page_num|visit_referrer                       |visit_search_engine|visit_start_pagename|visit_start_page_url                                               |visit_start_time_gmt|
+------------------------------------------------------+----------------------------------+------------------+------------+---------------+------------------------+----------+------------------------+-----------+-----------+----------------+--------------------+-------------------+---------------+----------+-----+-----+-----+-----+-----+-----+-----+-----+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-----------+------------------+-------------------------------------------------------------------+--------------------------+------------------+---------+-----------+-------+----------+--------+--------------+-----------------+-----------------+----------------------+---------+-------------------+------------------+-------------+------------+------------+-------------+----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------+----------+----------+----------+----------+-------------+---------------+--------------------+--------------------+--------------------+---------------------------------------------------------------------+--------------------+--------------+---------------------------------------------------------------------+----------------------+--------------------------------------------------------------------+----------+----------+-------------+-----------------+-----------------------------------------------------------+------------+----------+----------+-----------+-----------+------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+---------------+----------------------------------------------------------------------+------------------+----------+-------------------+-------------------+---------+----------+-------------+-------+-------------------------------------------------------------------------------------------------------------------------+--------------+---------+--------------+-------------------------------------+-------------------+--------------------+-------------------------------------------------------------------+--------------------+
|[{Instance of eVar1, null}, {Instance of eVar2, null}]|[{null, , null, null, null, null}]|Safari 7.1        |Apple       |LAN/Wifi       |Commercial (mostly U.S.)|1.6       |English (United States) |OS X 10.9.5|1400 x 864 |Search Engines  |en-us               |2015-07-13 00:26:18|netvigator.com |logged-out|guest|null |null |null |null |null |null |null |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |0          |null              |http://randyzwitch.com/broken-macbook-pro-hinge-fixed-free/        |https://www.google.com.hk/|1436761578        |hong kong|hkg        |0      |no region |0       |219.77.75.182 |0                |0                |0                     |1        |687                |1347              |null         |null        |Y           |USD          |1436761578            |logged-out|guest     |null      |null      |null      |null      |null      |null      |null      |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null      |null      |null      |null      |null      |::empty::    |0              |null                |null                |null                |http://randyzwitch.com/broken-macbook-pro-hinge-fixed-free           |null                |null          |http://randyzwitch.com/broken-macbook-pro-hinge-fixed-free           |Y                     |Broken MacBook Pro Hinge? Apple will fix for free! | randyzwitch.com|1173      |post      |single-post  |technology       |apple,customer-service,genius-bar,macbook-pro              |Randy Zwitch|1         |2012      |06         |25         |June 25, 2012     |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null           |https://www.google.com.hk/                                            |557               |null      |2791471528899189638|791228704714081521 |::hash::0|0         |google.com.hk|ss     |Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.1.17 (KHTML, like Gecko) Version/7.1 Safari/537.85.10     |::empty::     |1        |1             |https://www.google.com.hk/           |557                |null                |http://randyzwitch.com/broken-macbook-pro-hinge-fixed-free/        |1436761578          |
|[{Instance of eVar1, null}, {Instance of eVar2, null}]|[{null, , null, null, null, null}]|Google Chrome 43.0|Google      |LAN/Wifi       |Japan                   |1.6       |English (United States) |Windows 8.1|1280 x 800 |Search Engines  |en-US,en;q=0.8      |2015-07-13 00:56:09|aist.go.jp     |logged-out|guest|null |null |null |null |null |null |null |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |0          |null              |http://randyzwitch.com/rsitecatalyst-website-pathing-sankey-charts/|https://www.google.com/   |1436426719        |tsukuba  |jpn        |0      |08        |305-0005|150.29.149.177|1436754129       |0                |0                     |1        |777                |1293              |null         |null        |Y           |USD          |1436763369            |logged-out|guest     |null      |null      |null      |null      |null      |null      |null      |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null      |null      |null      |null      |null      |::empty::    |0              |null                |null                |null                |http://randyzwitch.com/rsitecatalyst-website-pathing-sankey-charts   |null                |null          |http://randyzwitch.com/rsitecatalyst-website-pathing-sankey-charts   |Y                     |Visualizing Website Pathing With Sankey Charts                      |3047      |post      |single-post  |digital-analytics|adobe-analytics,data-visualization,omniture,r,rsitecatalyst|Randy Zwitch|1         |2014      |09         |10         |September 10, 2014|7          |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null           |https://www.google.com/                                               |57                |null      |3037297388874966800|6917530475045353754|::hash::0|0         |google.com   |ss     |Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36            |::empty::     |4        |1             |https://www.google.com/              |57                 |null                |http://randyzwitch.com/rsitecatalyst-website-pathing-sankey-charts/|1436763369          |
|[{Instance of eVar1, null}, {Instance of eVar2, null}]|[{null, , null, null, null, null}]|Google Chrome 43.0|Google      |LAN/Wifi       |Network (mostly U.S.)   |1.6       |English (United States) |OS X 10.10 |1280 x 800 |Search Engines  |en-US,en;q=0.8      |2015-07-13 00:48:36|comcast.net    |logged-out|guest|null |null |null |null |null |null |null |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |null  |0          |null              |http://randyzwitch.com/hive-five-hard-won-lessons/                 |https://www.google.com/   |1435962984        |san jose |usa        |807    |ca        |95126   |50.136.222.167|1436200856       |0                |0                     |1        |777                |1197              |null         |null        |Y           |USD          |1436762916            |logged-out|guest     |null      |null      |null      |null      |null      |null      |null      |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null      |null      |null      |null      |null      |::empty::    |0              |null                |null                |null                |http://randyzwitch.com/hive-five-hard-won-lessons                    |null                |null          |http://randyzwitch.com/hive-five-hard-won-lessons                    |Y                     |Five Hard-Won Lessons Using Hive | randyzwitch.com                  |2680      |post      |single-post  |data-science     |big-data,hadoop,hive,python,r                              |Randy Zwitch|1         |2014      |06         |12         |June 12, 2014     |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null       |null           |https://www.google.com/                                               |57                |null      |3083707027358817578|6917535643501355093|::hash::0|0         |google.com   |ss     |Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36|::empty::     |4        |1             |https://www.google.com/              |57                 |null                |http://randyzwitch.com/hive-five-hard-won-lessons/                 |1436762916          |

Store in delta/iceberg format and be done with the madness:

val df = spark.read
  .format("be.icteam.adobe.analytics.datafeed")
  .option(ClickstreamOptions.MODIFIED_AFTER, checkpoint)
  .load("s3://bucket/landing/feed")

df.write.format("delta").save("s3://bucket/conformed/feed")

Development

Publish your own version in your local m2 repository:

sbt publishM2

Releases

This project leverages sbt-ci-release to create and publish to Sonatype and Maven Central from GitHub Actions.

Create and push the appropiate tag (vX.Y.Z) and ci.yml will make sure a release is built

git tag v0.1.0
git push --tags