Skip to content

Fuzzy matching library for scientific names with emphasis on performance and scalability

License

Notifications You must be signed in to change notification settings

GlobalNamesArchitecture/gnmatcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Global Names Matcher

https://circleci.com/gh/GlobalNamesArchitecture/gnmatcher.svg?style=svg

Global Names Matcher or gnmatcher is a Scala 2.10.3+ library for very fast fuzzy matching of a query string against given set of strings.

Installation

The artifacts for gnmatcher live on Maven Central.

Insert SBT line as follows to install the dependency:

libraryDependencies += "org.globalnames" %% "gnmatcher" % "0.1.0"

Corresponding maven code:

<dependency>
    <groupId>org.globalnames</groupId>
    <artifactId>gnmatcher_2.11</artifactId>
    <version>0.1.0</version>
</dependency>

<dependency>
    <groupId>org.globalnames</groupId>
    <artifactId>gnmatcher_2.10</artifactId>
    <version>0.1.0</version>
</dependency>

Matching

gnmatcher implements sophisticated heuristic algorithms to match semantical parts of scientific biological names as follows:

  • authors match answers to a question: how similar the authors string Linnaeus, Muller 1767 to the Muller and Linnaeus?

Authors Matching

The entire algorithm is ported from Ruby implementation developed by Patrick Leary of uBio and EOL fame. To find out the answer to the question above, run the code as follows:

$ sbt matcher/console
scala> import org.globalnames._
scala> AuthorsMatcher.score(Seq(Author("Linnaeus"), Author("Muller")), Some(1767),
     |                      Seq(Author("Muller"), Author("Linnaeus")), None)
res0: Double = 0.5

Contributors

License

Released under MIT license

About

Fuzzy matching library for scientific names with emphasis on performance and scalability

Resources

License

Stars

Watchers

Forks

Packages

No packages published