sanskrit-coders / sanskrit-lttoolbox   0.10

GitHub

Language translation toolbox (lt toolbox) based utilities for Sanskrit.

Scala versions: 2.12

Build Status

Intro

LTToolbox context

  • One can write finite state transducers for natural language processing (translation, declension, analysis) using lt-toolbox technologies.
  • One creates dix files defining these tranducers, compiles them to bin files and then uses them for the final ends.
  • Lt-toolbox libraries and tools are available in C++ and java.

Goals

  • Write scala/ java wrappers for invoking FST-s in lt-toolbox bin files.

Usage

Downstream projects

  • html-2-rest provides a REST API front for some of these tools. That repo also has the scl bin files.

Contribution

Deployment

  • PS: Install the lttoolbox package beforehand on the computer you will release from - else tests will fail.
  • Use sbt command release to publish to maven repos.
  • You should be able to use it roughly immediately; and after many hours you should see at maven repo listings here.

SCL tools

Intro

  • Smt Ambaa KulkarNi has led the creation of several such FST based tools. These are very useful for word generation and analysis.
  • They're hosted on http://scl.samsaadhanii.in and mirrors.

Resources available as of July 2017:

  • Only the lttoolbox compatible bin files are provided without any documentation (by request, not on any public site).
    • This is usually a part of the scl website code (under the GNU GENERAL PUBLIC LICENSE), which is antiquated and mostly useless for further development (while still serving as a useful reference for how the core lttoolbox bin files are to be invoked) as it:
      • relies on CGI technology.
      • is written in perl.
      • has a poor build system.
    • These lttoolbox compatible bin files don't work with the Java libraries as of July 2017 - see thread.
  • Smt Ambaa does not provide the dix files (including to this author).
    • So there is an outsider cannot consider using them to grok how the bin files are to be invoked, to understand the underlying lttoolbox technology by example or to develop the FST-s further.

Invocation tips

The below are sourced mostly from communication with smt ambaa and from experiments.

General tips

  • "To know the input for tin / krt / taddhita generator, you give a tinanta / krdanta / taddhitaanta to the all_morf.bin, and it will produce the analysis. That analysis is the input for the generator."

subanta generator

  • Regarding the level parameter:
    • level 1 is used for the avyutpanna praatipadikas and dhaatus (corresponds to inflectional morphology, with sup and tin)
    • level 0 is for the vyutpanna praatipadikas.
    • level 2: vyutpanna krdanta subanta (a noun form derived by adding a krt suffix)
    • level 3: vyutpanna taddhita subantas
    • level 4: vyutpanna uttarapadas of a compound

tinanta generator

Regarding the level parameter:

  • In the case of verb forms, I have only level 1. The Nijantas, though derived are assigned the same level.