fpopic / bigquery-schema-select   0.4

Apache License 2.0 GitHub

(Script) Generates SQL query that selects all fields (recursively for nested fields) from the provided BigQuery schema file.

Scala versions: 2.13

bigquery-schema-select

Scala CI

Generates SQL query that selects all fields (recursively for nested fields) from the provided BigQuery schema file.

Installation

Download latest version bigquery-schema-select_2.13-X.Y.jar from maven releases UI or using CLI:

# replace X.Y with the latest version
wget -O ~/bigquery-schema-select_2.13-X.Y.jar https://repo1.maven.org/maven2/com/github/fpopic/bigquery-schema-select_2.13/X.Y/bigquery-schema-select_2.13-X.Y.jar

Usage

Using existing table:

bq show --schema --format=prettyjson my_project:my_dataset.my_table | java -jar ~/bigquery-schema-select_2.13-X.Y.jar

Using JSON schema file:

cat my_schema.json | java -jar ~/bigquery-schema-select_2.13-X.Y.jar
[
  {
    "name": "A",
    "type": "TIMESTAMP"
  },
  {
    "name": "B",
    "type": "TIMESTAMP"
  },
  {
    "name": "C",
    "type": "RECORD",
    "fields": [
      {
        "name": "D",
        "type": "RECORD",
        "fields": [
          {
            "name": "E",
            "type": "TIMESTAMP"
          },
          {
            "name": "F",
            "type": "RECORD",
            "mode": "REPEATED",
            "fields": [
              {
                "name": "G",
                "type": "STRING"
              }
            ]
          }
        ]
      },
      {
        "name": "H",
        "type": "TIMESTAMP"
      }
    ]
  },
  {
    "name": "I",
    "type": "RECORD",
    "fields": [
      {
        "name": "J",
        "type": "TIMESTAMP"
      },
      {
        "name": "K",
        "type": "TIMESTAMP"
      }
    ]
  },
  {
    "name": "L",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "M",
        "type": "TIMESTAMP"
      },
      {
        "name": "N",
        "type": "TIMESTAMP"
      },
      {
        "name": "O",
        "type": "RECORD",
        "fields": [
          {
            "name": "P",
            "type": "TIMESTAMP"
          }
        ]
      }
    ]
  },
  {
    "name": "Q",
    "type": "TIMESTAMP",
    "mode": "REPEATED"
  }
]

Would generate:

SELECT
  A,
  B,
  STRUCT(
    STRUCT(
      C.D.E,
      ARRAY(
        SELECT AS STRUCT
          F.G
        FROM
          UNNEST(C.D.F) AS F
        WITH
          OFFSET
        ORDER BY
          OFFSET
      ) AS F
    ) AS D,
    C.H
  ) AS C,
  STRUCT(
    I.J,
    I.K
  ) AS I,
  ARRAY(
    SELECT AS STRUCT
      L.M,
      L.N,
      STRUCT(
        L.O.P
      ) AS O
    FROM
      UNNEST(L) AS L
    WITH 
      OFFSET
    ORDER BY
      OFFSET
  ) AS L,
  Q

In case you would like to use snake_case for field names use flag --use_snake_case:

cat my_schema.json | java -jar ~/bigquery-schema-select_2.13-X.Y.jar --use_snake_case