TypeToken - Perl scripts for processing word frequency lists

These scripts are designed to operate in pipes.

Piping scheme

[frequency list] => TT_ranks_from_no_ranks.pl => TT_ranks_short.pl => TT_fof.pl => TT_TT_growth_short_hapaxes_expected.pl [list of tokens] => TT_ranks.pl => TT_ranks_short.pl => TT_fof.pl => TT_TT_growth_short_hapaxes_expected.pl [list of tokens] => TT_growth.pl [list of tokens] => TT_growth_short.pl [list of tokens] => TT_growth_short_hapaxes.pl

[list of tokens] - a file of tokens separated by newlines [frequency list] - a sorted list of "frequency type" separated by newlines

cat [frequency list] | T_normalize.pl | sort -k 2 | T_uniq.pl | sort -nr > [frequency list]

Short script descriptions

TT_ranks_from_no_ranks.pl - adds ranks to a sorted frequency list

TT_ranks.pl - makes the rank list of types

TT_ranks_short.pl - makes the abridged rank list of types

TT_fof.pl - makes the list of frequencies of type frequencies

TT_growth.pl - computes the vocabulary curve

TT_growth_short.pl - computes a sparse vocabulary curve

TT_growth_short_hapaxes.pl - computes a sparse vocabulary and hapax curve

TT_growth_short_hapaxes_expected.pl - computes a sparse expected vocabulary and hapax curve

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TypeToken - Perl scripts for processing word frequency lists

Piping scheme

Short script descriptions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
TT_fof.pl		TT_fof.pl
TT_growth.pl		TT_growth.pl
TT_growth_short.pl		TT_growth_short.pl
TT_growth_short_hapaxes.pl		TT_growth_short_hapaxes.pl
TT_growth_short_hapaxes_expected.pl		TT_growth_short_hapaxes_expected.pl
TT_ranks.pl		TT_ranks.pl
TT_ranks_from_no_ranks.pl		TT_ranks_from_no_ranks.pl
TT_ranks_odd.pl		TT_ranks_odd.pl
TT_ranks_short.pl		TT_ranks_short.pl
T_grep.pl		T_grep.pl
T_normalize.pl		T_normalize.pl
T_uniq.pl		T_uniq.pl

License

lukasz-debowski/TypeToken

Folders and files

Latest commit

History

Repository files navigation

TypeToken - Perl scripts for processing word frequency lists

Piping scheme

Short script descriptions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages