Skip to content

a multithreaded socket server allowing to query word lists

Notifications You must be signed in to change notification settings

lkevers/dicServer

Repository files navigation

dicServer, a multithreaded socket server allowing to query word lists (inflected forms)

Author: Laurent Kevers (University of Corsica).

dicServer is available with default linguistic resources for nine languages, but it is possible to add more. You can also replace or modify these resources. See 'Resources' section for more information.

Versions

  • dicServer.py : uses python package "threading"
  • dicServerV2.py : another version of dicServer (uses different python packages : multiprocessing)

Execution

python3 dicServer.py WORKING_DIRECTORY
  • WORKING_DIRECTORY is the place where the scipt starts; linguistic resources are organized into dedicated directories under this working directory.
  • Port is specified into the script (default: 1112)
  • Logging is available into dicServer.log

You can query the server within your Python code with the following instructions (see test_threaded_daemon.py script) :

HOST, PORT = "", 1112
data = " ".join(sys.argv[1:]) # the query is given as arg

# Create a socket (SOCK_STREAM means a TCP socket)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

try:
	# Connect to server and send data
	sock.connect((HOST, PORT))
	sock.sendall(("%s\n"%data).encode(encoding='utf-8'))

	# Receive data from the server and shut down
	received = sock.recv(1024)

finally:
	sock.close()

To shutdown the server get the process id and kill it :

ps -aux | grep dicServer
kill <processID>

Result format

The result is returned as a json object.

Querying methods available

is_word::WORD
- For the word given as parameter
- Returns 'True' if it is present in at least one dictionary
- Else returns 'False'

is_lg_word::WORD::LG
- For a word and a language
- Returns 'True' if this word exists in the language dictionary
- Else returns 'False'

word_languages::WORD
- For a word
- Returns a list with all languages for which the word exists in the language dictionary.
- If the word doesn't exist in any dictionary, returns an empty list

word_possibleLanguages::WORD::LGlist
- For a word, and a reduced set of possible languages (LGlist eg. : eng,fra)
- Returns a list with all languages for which the word exists in the language dictionary.
- If the word doesn't exist in any dictionary, returns an empty list

Testing

Could be performed with the test_threaded_daemon.py script.

Example:

python3 test_threaded_daemon.py word_languages::car

-> Result: ["cos", "eng", "fra", "nld", "spa"]

Resources

The linguistic resources are defined into the script (see 'languages'). They must be stored somewhere under the WORKING_DIRECTORY defined as parameter of dicServer.

The data format is either a simple wordlist (one linguistic form by line) or a Unitex/DELA format. For this last case, the directory path to the language data file must contain the word 'unitex'.

The resources offered by default come from different sources :

dicServer Code License

The Python codes are released under CeCILL_V2.1.

About

a multithreaded socket server allowing to query word lists

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages