Introduction to TableParser

A functional parser of tables, implemented in Scala. Typically, the input is in the form of a "CSV" (comma-separated-values) file. However, it is perfectly possible to parse other formats.

TableParser aims to make it as simple as possible to ingest a fully typed tabular dataset. The principal mechanism for this is the use of case classes to specify the types of fields in the dataset. All conversions from strings to standard types are performed automatically. For non-standard types, it suffices simply to provide an implicit converter of the form String=>T.

It is possible to parse sequences of String (one per row)--the typical situation for a CSV file--or sequences of String (where the table corresponds to a matrix of cells).

This library makes extensive use of type classes and other implicit mechanisms. Indeed, it is implemented very similarly to JSON readers. There is a row-parser configuration mechanism that allows the programmer to vary the regular expressions for recognizing strings and delimiters and also to vary the quote character.

In addition to parsing, TableParser provides a mechanism for rendering a table in hierarchical form (for example for XML or HTML). An output structure which is itself tabular or sequence-oriented can be generated quite easily using the rows of the table, together with something like, for instance, a JSON writer.

Package Structure

As of version 1.1.4, the code has been split into three packages: core, cats, and spark. Most of the remainder of this README file refers to the core package. Use of cats-effect IO and encryption have been moved into the cats package. The spark package is for use with Apache Spark (beginning with 1.2.0).

Quick Intro

The simplest way to get an introduction to TableParser is to consult the movie.sc and airbnb.sc worksheets (the latter is in the cats package). These give detailed descriptions of each stage of the process.

Another way to see how it works is to look at this application Pairings which takes a CSV file, parses it, transforms the data, and outputs a JSON file. This way of parsing is a little different from what is shown in the worksheets. But both are effective. The minimum code necessary to read parse the CSV file as a table of "Player"s, using as many defaults as possible is:

case class Player(first: String, last: String)

object Player extends TableParserHelper[Player]() {
  def cellParser: CellParser[Player] = cellParser2(apply)
}

val pty: Try[Table[Player]] = Table.parseFile("players.csv")

The TableParserHelper used here is an abstract subclass of CellParsers and is customized for the row type (in this case, Player). In particular, it defines an implicit TableParser[Table[X]] where X is the row type (Player in this example). This assumes that the source input file ("players.csv") contains a header row which includes column names corresponding to the parameters of the case class Player (in this case "first" and "last"). If, for example, your CSV file does not have a header row, then you make a minor change to the line object Player...

The input file looks something like this (the first and last columns are required, others are ignored):

Id,First,Last,
1,Adam,Sullivan,
2,Amy,Avagadro,
3,Anna,Peterson,

etc...

Note that columns not needed for the Player case class are simply ignored. Also, note that the case of the column names is not important.

For another simple use case TableParser, please see my blog at: https://scalaprof.blogspot.com/2019/04/new-projects.html

User Guide

Current version: 1.2.0.

See release notes below for history.

Parsing

The Table trait expresses the result of parsing from a serialized representation of a table. Each row is represented by a parametric type Row. Typically, this Row type is a case class with one parameter corresponding to one column in the table file. However, some table files will have too many columns to be practical for this correspondence. In such a situation, you have two choices: (1) parsing each row as a list of String (also known as a "raw" row); (2) parsing each row as a hierarchical arrangement of case classes (or tuples). Typically, especially if the dataset is new to you, you will start with (1) and run an analysis on the columns to help you design the classes for option (2).

For the first option, you will do something like the following (see the AnalysisSpec unit tests):

Table.parseResourceRaw(resourceName) match {
  case Success(t@HeadedTable(_, _)) => println(Analysis(t))
  case _ =>
}

This analysis will give you a list of columns, each showing its name, whether it is optional (i.e., contains nulls), and (if it's a numerical column), its range, mean, and standard deviation.

Incidentally, this raw parser has three signatures, one for resources, one for files, and one for a sequence of Strings. And the default for raw row parsing is to allow quoted strings to span multiple lines.

But, if not parsing as raw rows, you will need to design a class hierarchy to model the columns of the table. TableParser will take care of any depth of case classes/tuples. Currently, there is a limit of 13 parameters per case class/tuple so with a depth of h classes/tuples you could theoretically handle 13^h attributes altogether.

The names of the parameters of a case class do not necessarily have to be the same as the column from which the value derives. The ColumnHelper class is available to manage the mapping between parameters and columns.

The result of parsing a table file (CSV, etc.) will be a Table[Row], wrapped in Try. There are object methods to parse most forms of text: File, Resource, InputStream, URL, Seq[String], etc. (see Table below).

The parser responsible for parsing the contents of a cell is called CellParser[T] where T is the type of the value in the cell in question. T is covariant so that if you have alternative parsers which generate different subclasses of trait, for instance, this can be done.

In order for TableParser to know how to construct a case class (or tuple) from a set of values, an implicit instance of CellParser[T] must be in scope. This is achieved via invoking a method (from object Parsers) of the following form: where f is a function that takes N parameters of types P1, P2, ... Pn respectively, and where T is the type to be constructed:

cellParserN[T,P1,P2,...Pn](f)

Typically, the function f is the apply method of the case class T, although you may have to explicitly refer to a particular function/method with a specific signature. When you have created a companion object to the case class, you will simply use the method name (typically apply) as in Name.apply (see example below). If you have created additional apply methods, you will need to define a function of a specific type and pass that in. Or, more simply, do as for ratingParser in the example below.

Note that P1, P2, ... Pn each have a context bound on CellParser (that's to say, there is implicit evidence of type CellParser[P]). This is the mechanism that saves the programmer from having to specify explicit conversions. T is bound to be a subtype of Product and has two context bounds: ClassTag and ColumnHelper.

See the section on CellParsers below.

Table

The Table class, which implements Iterable[Row], also has several methods for manipulation:

query methods

def content: Content[Row]
def maybeHeader: Option[Header]
def toCSV(implicit renderer: CsvRenderer[Row], generator: CsvProductGenerator[Row], csvAttributes: CsvAttributes): Iterable[String]
def maybeColumnNames: Option[Seq[String]]
def column(name: String): Iterator[Option[String]]

transformation methods

def flatMap[U](f: Row => Iterable[U]): Table[U]
def unit[S](rows: Iterable[S], maybeHeader: Option[Header]): Table[S]
def ++[U >: Row](table: Table[U]): Table[U]
def processRows[S](f: Iterable[Row] => Iterable[S]): Table[S]
def processRows[R, S](f: (Iterable[Row], Iterable[R]) => Iterable[S])(other: Table[R]): Table[S]
def sort[S >: Row : Ordering]: Table[S]
def select(range: Range): Table[Row]
def select(n: Int): Table[Row]
lazy val shuffle: Table[Row]

It is to be expected that join methods will be added later (based upon the second signature of processRows).

The following object methods are available for parsing text:

def parse[T: TableParser](ws: Seq[String]): Try[T]
def parse[T: TableParser](ws: Iterator[String]): Try[T]
def parse[T: TableParser](x: => Source): Try[T]
def parse[T: TableParser](u: URI)(implicit codec: Codec): Try[T]
def parse[T: TableParser](u: URI, enc: String): Try[T]
def parseInputStream[T: TableParser](i: InputStream)(implicit codec: Codec): Try[T]
def parseInputStream[T: TableParser](i: InputStream, enc: String): Try[T]
def parseFile[T: TableParser](f: File)(implicit codec: Codec): Try[T]
def parseFile[T: TableParser](f: File, enc: String): Try[T]
def parseFile[T: TableParser](pathname: String)(implicit codec: Codec): Try[T]
def parseFile[T: TableParser](pathname: String, enc: String): Try[T]
def parseResource[T: TableParser](s: String, clazz: Class[_] = getClass)(implicit codec: Codec): Try[T]
def parseResource[T: TableParser](u: URL, enc: String): Try[T]
def parseResource[T: TableParser](u: URL)(implicit codec: Codec): Try[T]
def parseSequence[T: TableParser](wss: Seq[Seq[String]]): Try[T]

Please note that, in the case of a parameter being an Auto-closeable object such as InputStream or Source, it is the caller's responsibility to close it after parsing. However, if the parameter is a File, or filename, or URL/URI, then any Source object that is instantiated within the parse method will be closed. This applies also to the parseInputStream methods: the internally defined Source will be closed (but not the stream).

Additionally, there is an implicit class called ImplicitParser (defined in the TableParser companion object) which allows for expressions such as:

parser parse source

This is the recommended way to parse because it is the simplest. It also allows chaining of "lens" methods to configure the parser, for example:

val parser = RawTableParser().setPredicate(TableParser.sampler(2)).setMultiline(true)

TableParser

TableParser is also the name of a trait that takes a parametric type called "Table" in its definition. This is NOT the same at the Table type (described above). TableParser is defined thus:

trait TableParser[Table] {
  type Row
  type Input
  protected val maybeHeader: Option[Header] = None
  val headerRowsToRead: Int = 1
  def forgiving: Boolean = false
  def multiline: Boolean = false
  val predicate: Try[Row] => Boolean = includeAll
  def rowParser: RowParser[Row, Input]
  def builder(rows: Iterator[Row]): Table
  def parse(xs: Iterator[Input], n: Int = headerRowsToRead): Try[Table]

}

The type Row defines the specific row type of the resulting Table (for example, Movie, in the example below). The type Input defines the input type, typically String, but there are also alternatives such as Seq[String]. hasHeader is used to define if there is a header row in the first line of the file (or sequence of strings) to be parsed. forgiving, which defaults to false, can be set to true if you expect that some rows will not parse, but where this will not invalidate your dataset as a whole. multiline is used to allow (or disallow when false) quoted strings to span multiple lines.

In forgiving mode, any exceptions thrown in the parsing of a row are collected and then logged. rowParser is the specific parser for the Row type (see below). builder is used by the parse method. parse is the main method of TableParser and takes a Seq[String] and yields a Try[Table].

The predicate is used to filter rows (which are the results of parsing). By default, all rows are included. TableParser also provides a method (sampler) to create a random sampling function. Note, however, that a significant part of the time for building a table from a large file is just reading and parsing the file. Sampling will not reduce this portion of the time.

Associated with TableParser is an abstract class called TableParserHelper, whose purpose is to make your coding job easier. TableParserHelper is designed to be extended (i.e., subclassed) by the companion object of the case class that you wish to parse from a row of your input. Doing it this way makes it easier for the implicit TableParser instance to be found. You can also set up your application along the lines of the examples below, such as the Movie example.

The constructor for TableParserHelper takes two parameters, both of which can be defaulted:

sourceHasHeaderRow: Boolean = true
forgiving: Boolean = false

RowParser

RowParser is a trait that defines how a line of text is to be parsed as a Row. Row is a parametric type that, in subtypes of RowParser, is context-bound to CellParser. A second parametric type Input is defined: this will take on values of String or Seq[String], according to the form of input. Typically, the StandardRowParser is used, which takes as its constructor parameter a LineParser.

The methods of RowParser are:

def parse(header: Header)(w: String): Try[Row]

def parseIndexed(header: Header)(indexedRow: (Input, Int)): Try[Row]

def parseHeader(w: String): Try[Header]

The parseIndexed method is useful when we care about the sequential aspect of the input. This is particularly important if strings are allowed to spread over newlines (as in the Airbnb dataset).

LineParser

The LineParser takes five parameters: two regexes, a String and two Chars. These define, respectively, the delimiter regex, the string regex, list enclosures, the list separator, and the quote character. Rather than invoke the constructor directly, it is easier to invoke the companion object's apply method, which takes a single implicit parameter: a RowConfig. Two consecutive quote characters, within a quoted string, will be parsed as a single quote character. The LineParser constructor will perform some basic checks that its parameters are consistent.

StringsParser

StringsParser is a trait which defines an alternative mechanism for converting a line of text to a Row. As with the RowParser, Row is a parametric type which is context-bound to CellParser. StringsParser is useful when the individual columns have already been split into elements of a sequence. Typically, the StandardStringsParser is used.

The methods of StringsParser are:

def parse(ws: Seq[String])(header: Header): Try[Row]

def parseHeader(ws: Seq[String]): Try[Header]

CellParsers

There are a number of methods which return an instance of CellParser for various situations:

def rawRowCellParser: CellParser[RawRow]
def cellParserRepetition[P: CellParser : ColumnHelper](start: Int = 1): CellParser[Seq[P]]
def cellParserSeq[P: CellParser]: CellParser[Seq[P]]
def cellParserOption[P: CellParser]: CellParser[Option[P]]
lazy val cellParserOptionNonEmptyString: CellParser[Option[String]]
def cellParser[P: CellParser, T: ClassTag](construct: P => T): CellParser[T]
def cellParser1[P1: CellParser, T <: Product : ClassTag : ColumnHelper](construct: P1 => T, fields: Seq[String] = Nil): CellParser[T]
etc. through cellParser13...
def cellParser2Conditional[K: CellParser, P, T <: Product : ClassTag : ColumnHelper](construct: (K, P) => T, parsers: Map[K, CellParser[P]], fields: Seq[String] = Nil): CellParser[T]
def columnHelper[T](maybePrefix: Option[String], aliases: (String, String)*): ColumnHelper[T]
etc. including other ways to instantiate a ColumnHelper[T].

The methods of form cellParserN are the parsers that are used to parse into case classes. Ensure that you have the correct number for N: the number of fields/parameters in the case class you are instantiating. If you don't, the compiler, or your IDE, will warn you. In some situations, the reflection code is unable to get the field names in order (for example, when there are public lazy values). In such a case, add the second parameter to explicitly define the order of the field names. Normally, of course, you can leave this parameter unset.

There is one additional method to handle the situation where you want to vary the parser for a set of cells according to the value in another (key) column: cellParser2Conditional. In this case, you must supply a Map which specifies which parser is to be used for each possible value of the key column. If the value in that column is not one of the keys of the map, an exception will be thrown. For an example of this, please see the example in CellParsersSpec ("conditionally parse").

Implicits

Keep in mind when using implicit values that the best practice is to define an implicit involving a type T, for example, CellParser[T], in the companion object of T. This will tend to eliminate any amiguously defined implicits, and it also tends to avoid any problems with initialization. If you still run into initialization problems, try defining the troublemaker as lazy. It also relieves you from having to make up names for the implicit values (which the compiler more or less ignores, anyway). Just ensure that the name is valid, doesn't invoke a recursion, and is not in conflict with another name. If you look in the example of Principal (below), you will see that this is also the place to define optional parsers, sequential parsers, etc.

Caveats

A case class which represents a row (or part of a row) of the table you want to create from parsing, or which you want to render must abide by certain rules:

There should not be any fields defined in the body of the case class. So, no val, var or lazy val. Instead, any behavior you want to add to the class, beyond the parameters (fields) of the class, must be defined using def.

Example: Movie

In this example, we parse the IMDB Movie dataset from Kaggle. The basic structure of the application code will look something like this:

    import MovieParser._
    val x: Try[Table[Movie]] = Table.parseResource("movie_metadata.csv")

In this example, the row type is Movie, a case class with eleven parameters. The data can be found in a local resource (relative to this class) called movie_metadata.csv. All the (implicit) details that characterize this particular table input are provided in the MovieParser object.

The Movie class looks like this:

case class Movie(title: String, format: Format, production: Production, reviews: Reviews, director: Principal, actor1: Principal, actor2: Principal, actor3: Option[Principal], genres: AttributeSet, plotKeywords: AttributeSet, imdb: String)

Note that we make actor3 optional because some movies don't specify an "actor3".

In order to parse a Movie, we will need to declare some implicit values in the companion object. The following is the required code:

object Movie extends CellParsers with CsvGenerators with CsvRenderers {
    val missing: Movie = apply("", Format.none, Production.none, Reviews.none, Principal.nemo, Principal.nemo, Principal.nemo, None, AttributeSet.none, AttributeSet.none, "")
    val header = "color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes"
    implicit val helper: ColumnHelper[Movie] = columnHelper(camelToSnakeCaseColumnNameMapper,
        "title" -> "movie_title",
        "imdb" -> "movie_imdb_link")
    implicit val parser: CellParser[Movie] = cellParser11(apply)
    implicit val renderer: CsvRenderer[Movie] = renderer11(apply)
    implicit val generator: CsvGenerator[Movie] = generator11(apply)

    // Additional code shown below for parsing tables or processing rows.
}

We define a missing object because that is sometimes convenient to use with Spark. Similarly, the header object is the standard header strings which can be used when reading a CSV file without a header. The (implicit) helper is used to map the names of columns appropriately. The (implicit) parser is the required CellParser for Movie. The (implicit) renderer is used to render a Movie as a CSV file. The (implicit) generator is used for outputting a Movie in other format(s).

Each of the case classes referenced in the delcaration of Movie will also need a similar companion object. For example, the Principal, whose case class is defined thus:

case class Principal(name: Name, facebookLikes: Int)

requires a companion object that looks like this:

object Principal extends CellParsers with CsvGenerators with CsvRenderers {
    val nemo: Principal = Principal(Name.nemo, 0)
    implicit val helper: ColumnHelper[Principal] = columnHelper(camelToSnakeCaseColumnNameMapper, Some("$x_$c"))
    implicit val parser: CellParser[Principal] = cellParser2(apply)
    implicit val parserOpt: CellParser[Option[Principal]] = cellParserOption
    implicit val renderer: CsvRenderer[Principal] = renderer2(apply)
    implicit val rendererOpt: CsvRenderer[Option[Principal]] = optionRenderer()
    implicit val generator: CsvGenerator[Principal] = generator2(apply)
    implicit val generatorOpt: CsvGenerator[Option[Principal]] = optionGenerator
}

Like Movie, it has a default value (nemo), as well as a helper to get the column names correct. We need to define parsers, renderers, etc. for optional Principals. Unlike with primitive values such as Int, Double, we do have to add additional implicit definitions to accomplish this. Note that optionRenderer takes an optional parameter that defines a String to be used when the object is missing.

The other case classes look like this:

case class Format(color: String, language: String, aspectRatio: Double, duration: Int)
case class Production(country: String, budget: Option[Int], gross: Int, titleYear: Int)
case class Reviews(imdbScore: Double, facebookLikes: Int, contentRating: Rating, numUsersReview: Int, numUsersVoted: Int, numCriticReviews: Int, totalFacebookLikes: Int)
case class Name(first: String, middle: Option[String], last: String, suffix: Option[String])
case class Rating(code: String, age: Option[Int])

Consult the actual code in Movie.scala for the details of what is required in the corresponding companion objects.

The Movie object has additional code like this:

object Movie ... {
    // Required code as shown above

    // First, we need a StringParser[Movie]...
    implicit object MovieConfig extends DefaultRowConfig {
        override val listEnclosure: String = ""
    }
    implicit val stringParser: StringParser[Movie] = StandardRowParser.create[Movie]

    // Next, if we want to be able to parse rows into a Table[Movie], we will need a TableParser[Movie]...
    trait MovieTableParser extends HeadedCSVTableParser[Movie] {
        override val forgiving: Boolean = true
    }
    implicit object MovieTableParser extends MovieTableParser
}

We use the forgiving mode for MovieTableParser because we expect that there will be many rows which cannot be parsed. In this code, helper, and the other columnHelpers, specify parameter-column mappings. Note that helper for Principal has an extra parameter at the start of the parameter list:

Some("$x_$c")

which is an (optional) formatter for the purpose of prefixing a string to column names. That's because there are several "Principal" parameters in a Movie, and each one has its own set of attributes. In this format parameter, "$x" is substituted by the prefix (the optional value passed into the lookup method) while $c represents the translated column name.

A couple of parameters of Movie are actually attribute sets (AttributeSet). These are basically lists of String within one column value. Such lists are parsed as lists as they are parsed from the original strings and then returned as strings in the form "{" element "," element ... "}" The parsing from the original string obeys the RowConfig parameters of listSep and listEnclosure.

In this case, the row config object is defined as MovieConfig. There, you also see the parameters string which is a regular expression for a string in a table cell; and delimiter which is a regular expression which defines the separator between table columns; and quote the quote character which can be used to include cell values which enclose the separator.

A parameter can be optional, for example, in the Movie example, the Production class is defined thus:

case class Production(country: String, budget: Option[Int], gross: Int, title_year: Int)

In this example, some movies do not have a budget provided. All you have to do is declare it optional in the case class and TableParser will specify it as Some(x) if valid, else None.

Note that there is a default, implicit RowConfig object defined in the object RowConfig.

If, instead of building a Table[Movie], you prefer to process rows into an Iterator[Movie], then you should define the following instead of the MovieTableParser:

object Movie ... {
    // Required code as shown above...

    // Finally, an optional row processor--this is useful when you simply want to end up with an _Iterator[Movie]_
    // rather than a Table[Movie]. Typically, we will either use this OR MovieTableParser.
    trait MovieRowProcessor extends StringRowProcessor[Movie] {
        override val forgiving: Boolean = true
    }
    implicit object MovieRowProcessor extends MovieRowProcessor
}

Example: Submissions

This example has two variations on the earlier theme of the Movies example: (1) each row (a Submission) has an unknown number of Question parameters; (2) instead of reading each row from a single String, we read each row from a sequence of Strings, each corresponding to a cell.

The example comes from a report on the submissions to a Scala exam. Only one question is included in this example.

case class Submission(username: String, lastName: String, firstName: String, questions: Seq[Question])

object Submission extends CellParsers {
    implicit val submissionColumnHelper: ColumnHelper[Submission] = columnHelper(ColumnHelper.camelCaseColumnNameMapperSpace, Some("$c $x"))
    implicit val submissionParser: CellParser[Submission] = cellParser4(apply)
    implicit val parser: StandardStringsParser[Submission] = StandardStringsParser[Submission]()

    implicit object TableParser extends StringsTableParser[Table[Submission]] {
      type Row = Submission

      protected def builder(rows: Iterable[Row], header: Header): Table[Row] = HeadedTable(rows, header)

      override val forgiving: Boolean = false

      val rowParser: RowParser[Row, Seq[String]] = implicitly[RowParser[Row, Seq[String]]]
    }
}

case class Question(questionId: String, question: String, answer: Option[String], possiblePoints: Int, autoScore: Option[Double], manualScore: Option[Double])

object Question extends CellParsers {
    private val mapper: String => String = _.replaceAll("(_)", " ")
    implicit val helper: ColumnHelper[Question] = columnHelper(mapper, Some("$c $x"), "questionId" -> "question_ID")
    implicit val optParserString: CellParser[Option[String]] = cellParserOption
    implicit val parser: CellParser[Question] = cellParser6(apply)
    implicit val seqParser: CellParser[Seq[Question]] = cellParserRepetition[Question]()
}

To test this example, we run a unit test as follows (using scalatest):

behavior of "TableParser"
it should "parse Submission" in {
    val rows: Seq[Seq[String]] = Seq(
      Seq("Username", "Last Name", "First Name", "Question ID 1", "Question 1", "Answer 1", "Possible Points 1", "Auto Score 1", "Manual Score 1"),
      Seq("001234567s", "Mr.", "Nobody", "Question ID 1", "The following are all good reasons to learn Scala -- except for one.", "Scala is the only functional language available on the Java Virtual Machine", "4", "4", "")
    )
    import Submission.TableParser
    matchTry(Table.parseSequence(rows.iterator)) {
      case rt@HeadedTable(_, _) =>
        println(rt.head)
        rt.size shouldBe 1
    }
}

Note the use of cellParserRepetition. The parameter allows the programmer to define the start value of the sequence number for the columns. In this case, we use the default value: 1 and so don't have to explicitly specify it. Also, note that the instance of ColumnHelper defined here has the formatter defined as "$c $x" which is in the opposite order from the Movie example.

Rendering

TableParser provides a general mechanism for rendering (serializing to text) tables. Indeed, Table[Row] extends Renderable[Row] which supports the render(implicit rs: StringRenderer[Row]) method. two mechanisms for rendering a table:

one to a straight serialized output, for example, when rendering a table as a CSV file.
the other to a hierarchical (i.e., tree-structured) output, such as an HTML file.

Non-hierarchical output

For this type of output, the application programmer must provide an instance of Writer[O] which is, for example a StringBuilder, BufferedOutput, or perhaps an I/O Monad.

The non-hierarchical output does not support the same customization of renderings as does the hierarchical output. It's intended more as a straight, quick-and-dirty output mechanism to a CSV file.

Here, for example, is an appropriate definition.

implicit object StringBuilderWriteable extends Writable[StringBuilder] {
	def writeRaw(o: StringBuilder)(x: CharSequence): StringBuilder = o.append(x.toString)
	def unit: StringBuilder = new StringBuilder
	override def delimiter: CharSequence = "|"
}

The default delimiter is ", ". You can override the newline and quote methods too if you don't want the defaults.

And then, following this, you will write something like the following code:

print(table.render.toString)

The Writable object will take care of inserting the delimiter and quotes as appropriate. Columns will appear in the same order as the parameters of Row type (which must be either a Product, such as a case class, or an Array or a Seq). If you need to change the order of the rows, you will need to override the writeRow method of Writable.

Hierarchical rendering

A type class called TreeWriter is the main type for hierarchical rendering. One of the instance methods of Table[Row] is a method as follows:

def renderHierarchical\[U: TreeWriter](style: String)(implicit rr: HierarchicalRenderer[Row]): U

Providing that you have defined an implicit object of type TreeWriter[U] and a HierarchicalRenderer[Row], then the renderHierarchical method will produce an instance of U which will be a tree containing all the rows of this table.

What sort of type is U? An XML node would be appropriate. The specifications use a type called HTML which is provided in package parse.render.tag more as an exemplar rather than something definitive.

case class HTML(tag: String, content: Option[String], attributes: Map[String, String], hs: Seq[HTML])

The example TreeWriter for this type is reproduced here:

trait TreeWriterHTML$ extends TreeWriter[HTML] {
	def addChild(parent: HTML, child: HTML): HTML = parent match {
		case HTML(t, co, as, hs) => HTML(t, co, as, hs :+ child)
	}
	def node(tag: String, content: Option[String], attributes: Map[String, String], children: Seq[HTML]): HTML = HTML(tag, content, attributes, children)
	
implicit object TreeWriterHTML$ extends TreeWriterHTML$

If we have a row type as for example:

case class Complex(r: Double, i: Double)

Then, we should define appropriate renderers along the following likes:

implicit val valueRenderer: HierarchicalRenderer[Double] = renderer("td")
implicit val complexRenderer: HierarchicalRenderer[Complex] = renderer2("tr")(Complex)

We can then write something like:

val table = HeadedTable(Seq(Complex(0, 1), Complex(-1, 0)), Header.create("r", "i"))
val h = table.renderHierarchical("table", Map("border" -> "1"))

The result of this will be an HTML tree which can be written out thus as a string:

 <table border="1">
 <tr>
 <td name="r">0.0</td>
 <td name="i">1.0</td></tr>
 <tr>
 <td name="r">-1.0</td>
 <td name="i">0.0</td></tr></table>

As with the parsing methods, the conversion between instances of types (especially case classes) and Strings is hierarchical (recursive).

If you need to set HTML attributes for a specific type, for example a row in the above example, then an attribute map can be defined for the renderer2 method.

CSV Rendering

If you simply need to write a table to CSV (comma-separated value) format as a String, then use the toCsv method of Table[T]. Note that there is also an object method of Table called toCsvRow which can be used for instances of Table[Row]. More control can be gained by using CsvTableStringRenderer[T] or CsvTableFileRenderer[T] for a particular type T.

These require customizable (implicit) evidence parameters and are defined as follows:

case class CsvTableStringRenderer[T: CsvRenderer : CsvGenerator]()(implicit csvAttributes: CsvAttributes)
    extends CsvTableRenderer[T, StringBuilder]()(implicitly[CsvRenderer[T]], implicitly[CsvGenerator[T]], Writable.stringBuilderWritable(csvAttributes.delimiter, csvAttributes.quote), csvAttributes)
case class CsvTableFileRenderer[T: CsvRenderer : CsvGenerator](file: File)(implicit csvAttributes: CsvAttributes)
    extends CsvTableRenderer[T, FileWriter]()(implicitly[CsvRenderer[T]], implicitly[CsvGenerator[T]], Writable.fileWritable(file), csvAttributes)
abstract class CsvTableRenderer[T: CsvRenderer : CsvGenerator, O: Writable]()(implicit csvAttributes: CsvAttributes) extends Renderer[Table[T], O] {...}

CsvRenderer[T] determines the layout of the rows, while CsvGenerator[T] determines the header. CsvAttributes specify the delimiter and quote characters for the output. Instances of each can be created using methods in CsvRenderers and CsvGenerators respectively. Appropriate methods are:

sequenceRenderer, optionRenderer, renderer1, renderer2, renderer3, etc. up to renderer12.
sequenceGenerator, optionGenerator, generator1, generator2, generator3, etc. up to generator12.

In some situations, you will want to omit values (and corresponding header columns) when outputting a CSV file. You may use the following methods (from the same types as above):

def skipRenderer[T](alignment: Int = 1)(implicit ca: CsvAttributes): CsvRenderer[T] 
def skipGenerator[T](implicit ca: CsvAttributes): CsvGenerator[T]

Note that, when rendering a CSV row, you may want to simply render some number of delimiters (this would be in the case where you have a fixed header). You can use the alignment parameter of skipRenderer to ensure alignment is correct.

As usual, the standard types are pre-defined for both CsvRenderer[T] and CsvGenerator[T] (for Int, Double, etc.).

The methods mentioned above render tables in the form of CSV Strings. However, there are also methods available to render tables as a File: writeCSVFile and writeCSVFileRow. These utilize the type CsvTableFileRenderer[T] mentioned above.

If you wish to output only a subset of rows, then you should use one of the methods defined in Table such as take.

Other String Rendering

Apart from CSV, there is currently only one implementation of String rendering, and that is Json rendering. Although Json is indeed a hierarchical serialization format, the manner of creating a Json string masks the hierarchical aspects. The implemented Json reader/writer is Spray Json but that could easily be changed in the future.

Although this section is concerned with rendering, it is also true, of course, to say that tables can be read from Json strings.

The following example from JsonRendererSpec.scala shows how we can take the following steps (for the definitions of Player, Partnership, please see the spec file itself):

read a table of players from a list of Strings (there are, as shown above, other signatures of parse for files, URLs, etc.);
convert to a table of partnerships;
write the resulting table to a Json string;
check the accuracy of the Json string;
check that we can read the string back in as a table.

val strings = List("First, Last", "Adam,Sullivan", "Amy,Avagadro", "Ann,Peterson", "Barbara,Goldman") val wy: Try[String] = for (pt <- Table.parseTable[Player]) yield Player.convertTable(pt).asInstanceOf[Renderable[Partnership]].render wy should matchPattern { case Success("{\n "rows": [{\n "playerA": "Adam S",\n "playerB": "Amy A"\n }, {\n "playerA": "Ann P",\n "playerB": "Barbara G"\n }],\n "header": ["playerA", "playerB"]\n}") => } implicit val r: JsonFormat[Table[Partnership]] = new TableJsonFormat[Partnership] {} wy.map(p => p.parseJson.convertTo[Table[Partnership]]) should matchPattern { case Success(HeadedTable(_, _)) => }

Release Notes

V1.1.4 -> V1.2.0

Significant changes including the completion of the split into three packages with...
Functioning spark package.
Now supports Iterator to Iterator processing.

V1.1.3 -> V1.1.4

Split into three modules: core, cats and spark.
Minor changes functionally speaking.

V1.1.2 -> V1.1.3

Use of Cats IO [CircleCI failure due to missing library]

V1.1.1 -> V1.1.2

Make RawRow a type (not just a type alias)

V1.1.0 -> V1.1.1

Enable cryptographic capabilities
Uses TSEC-JCA and Cats IO.
Many relatively minor fixes/improvements.

V1.0.15 -> V1.1.0

Enable CSV-rendering and selection of table rows.

V1.0.14 -> V1.0.15

Minor changes

V1.0.13 -> V1.0.14

Enabled multi-line quoted strings: if a quoted string spans more than one line, this is acceptable.
Implemented analysis of raw-row tables.
Implemented sampling of input.
Provided a new mechanism for configuring and using parsers (see the worksheets).
Implemented Table.parseResourceRaw and Table.parseFileRaw for those situations where you just want to parse an input file into a Table[Seq[String]].

V1.0.12 -> V1.0.13

mostly concerned with publishing TableParser in Maven Central

V1.0.11 -> V1.0.12

mostly internal refactoring: restored Renderable interface (though different from before)

V1.0.10 -> V1.0.11

introduction of logging;
introduction of JSON (spray) for read/write of Table;
Table now supports Iterable=>Iterable methods.
renaming of Renderer to HierarchicalRenderer and introduction of StringRenderer
introduction of TableParserHelper;
renamed TableWithoutHeader as UnheadedTable and TableWithHeader as HeadedTable;
added various methods, inc. replaceHeader, to Table.
Table parsing is now based on Iterator rather than Iterable.
Table rows are now based on Vector (at least for the standard TableWithHeader)

V1.0.9 -> V1.0.10

build.sbt: changed scalaVersion to 2.13.3
added StringTableParserWithHeader;
now column names are found by case-independent comparison.

V1.0.8 -> V1.0.9

build.sbt: changed scalaVersion to 2.12.10

V1.0.7 -> V1.0.8

build.sbt: changed scalaVersion to 2.12.9
refactored the concept of tables with/without headers in TableParser;
enabled program-defined headers that match Excel-style numbers or letters.

V1.0.6 -> V1.0.7

build.sbt: changed scalaVersion to 2.12.8
CellParser: parametric type T is now covariant;
CellParsers: added new method cellParserOptionNonEmptyString; then for each of the cellParserN methods, the signature has had a defaultable fields parameter to allow explicit field naming;
Reflection: changed the message to refer to the cellParserN signatures;
README.md: fixed some issues with the doc regarding the MovieTableParser; added new features above.

V1.0.5 -> V1.0.6

Added a standard implicit value of ColumnHelper for situations that don't need extra help.

V1.0.4 -> V1.0.5

Added a convenient way of rendering a table as a non-hierarchical structure. In other words, serialization to a CSV file.

V1.0.3 -> V1.0.4

Added the ability to add header row and header column for tables (NOTE: not finalized yet, but functional).

V1.0.2 -> V1.0.3

Added no implicit warnings
Created mechanism for rendering the result of parsing in a hierarchical structure.

V1.0.1 -> V1.0.2

Added self-checking of LineParser;
Able to parse two quote-chars together in a quotation as one quote char;
Added enc and codec params as appropriate to Table.parse methods.
Added stringCellParser;
Now, properly closes source in Table.parse methods.

V1.0.0 -> V.1.0.1

Fixed Issue #1;
Added parsing of Seq[Seq[String]];
Added cellParserRepetition;
Implemented closing of Source in Table.parse methods;
Added encoding parameters to Table.parse methods.

rchillyard / tableparser 1.2.0