annoy4s / annoy4s   0.10.0

Apache License 2.0 GitHub

Scala wrapper for Annoy

Scala versions: 2.13 2.12 2.11 2.10

annoy4s

Build Status

A JNA wrapper around spotify/annoy which calls the C++ library of annoy directly from Scala/JVM.

Installation

For linux-x86-64 or Mac users, just add the library directly as:

libraryDependencies += "net.pishen" %% "annoy4s" % "0.10.0"

If you meet an error like below when using annoy4s, you may have to compile the native library by yourself.

java.lang.UnsatisfiedLinkError: Unable to load library 'annoy': Native library

To compile the native library and install annoy4s on local machine:

  1. Clone this repository.
  2. Check the values of organization and version in build.sbt, you may change it to the value you want, it's recommended to let version have the -SNAPSHOT suffix.
  3. Run compileNative in sbt (Note that g++ installation is required).
  4. Run test in sbt to see if the native library is successfully compiled.
  5. Run publishLocal in sbt to install annoy4s on your machine.

Now you can add the library dependency as (organization and version may be different according to your settings):

libraryDependencies += "net.pishen" %% "annoy4s" % "0.10.0-SNAPSHOT"

The library file generated by the g++ command in compileNative can also be installed independently on your machine. Please reference to library search paths for more details on how to make JNA able to load the library.

Usage

Create and query the index in memory mode:

import annoy4s._

val annoy = Annoy.create[Int]("./input_vectors", numOfTrees = 10, metric = Euclidean, verbose = true)

val result: Option[Seq[(Int, Float)]] = annoy.query(itemId, maxReturnSize = 30)
  • The format of ./input_vectors is <item id> <vector> for each line, here is an example:
3 0.2 -1.5 0.3
5 0.4 0.01 -0.5
0 1.1 0.9 -0.1
2 1.2 0.8 0.2
  • <item id> could be Int, Long, String, or UUID, just change the type parameter at Annoy.create[T]. You can also implement a KeyConverter[T] by yourself to support your own type.
  • metric could be Euclidean, Angular, Manhattan or Hamming.
  • result is a tuple list of id and distances, where the query item is itself contained.

To use the index in disk mode, one need to provide an outputDir:

val annoy = Annoy.create[Int]("./input_vectors", 10, outputDir = "./annoy_result/", Euclidean)

val result: Option[Seq[(Int, Float)]] = annoy.query(itemId, maxReturnSize = 30)

annoy.close()

// load an created index
val reloadedAnnoy = Annoy.load[Int]("./annoy_result/")

val reloadedResult: Option[Seq[(Int, Float)]] = reloadedAnnoy.query(itemId, 30)