Skip to content

Commit

Permalink
Deploying to gh-pages from @ 8999768 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
Aethor committed Nov 3, 2023
1 parent 8f20415 commit 91b325d
Show file tree
Hide file tree
Showing 8 changed files with 209 additions and 5 deletions.
49 changes: 48 additions & 1 deletion _sources/pipeline.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ time. In Renard, such graphs are representend by a ``List`` of
NLTKNamedEntityRecognizer(),
GraphRulesCharacterUnifier(min_appearances=10),
CoOccurrencesGraphExtractor(
co_occurences_dist=25,
co_occurrences_dist=25,
dynamic=True, # note the 'dynamic'
dynamic_window=20 # and the 'dynamic_window' argument
)
Expand All @@ -240,3 +240,50 @@ dynamic graph using a slider, and
graph to a directory. Meanwhile,
:meth:`.PipelineState.export_graph_to_gexf` correctly exports the
dynamic graph to the Gephi format.


Multilingual Support
====================

Renard supports multiple languages. By default, a :class:`.Pipeline`
is configured for English, but can create a pipeline for any language
*as long as all of its steps support it*. To configure a pipeline for
another language, you can pass the ISO 639-3 code of the language you
want:

.. code-block:: python
from renard.pipeline import Pipeline
from renard.pipeline.tokenization import NLTKTokenizer
from renard.pipeline.ner import BertNamedEntityRecognizer
from renard.pipeline.character_unification import GraphRulesCharacterUnifier
from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor
with open("./my_doc_in_french.txt") as f:
text = f.read()
pipeline = Pipeline(
[
NLTKTokenizer(),
BertNamedEntityRecognizer(),
GraphRulesCharacterUnifier(min_appearances=10),
CoOccurrencesGraphExtractor(co_occurrences_dist=25)
],
lang="fra" # ISO 639-3 language code for french
)
out = pipeline(text)
This pipeline is valid because :class:`.NLTKTokenizer`,
:class:`.BertNamedEntityRecognizer` and
:class:`.GraphRulesCharacterUnifier` all support french, and that
:class:`.CoOccurencesGraphExtractor` works for any language. If that
pipeline was invalid, Renard would display an error message explaining
why. Renard can perform this language check because each step
explicitely indicates which languages it supports by overriding the
:meth:`.PipelineStep.supported_langs` method. This method returns the
sets of languages supported by a step as ISO 639-3 codes. The special
string ``"any"`` is used to indicate that the step works regardless of
language. If the method is not overrided, the default is english
support only.
2 changes: 1 addition & 1 deletion _sources/reference.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Characters Extraction
NaiveCharacterUnifier
---------------------

.. autoclass:: renard.pipeline.character_unificatrion.NaiveCharacterUnifier
.. autoclass:: renard.pipeline.character_unification.NaiveCharacterUnifier
:members:

GraphRulesCharacterUnifier
Expand Down
14 changes: 14 additions & 0 deletions genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ <h2 id="_">_</h2>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.__call__">__call__() (renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>

<ul>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier.__call__">(renard.pipeline.character_unification.NaiveCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.Pipeline.__call__">(renard.pipeline.core.Pipeline method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.PipelineStep.__call__">(renard.pipeline.core.PipelineStep method)</a>
Expand Down Expand Up @@ -157,6 +159,8 @@ <h2 id="_">_</h2>

<ul>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.__init__">(renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier.__init__">(renard.pipeline.character_unification.NaiveCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.Mention.__init__">(renard.pipeline.core.Mention method)</a>
</li>
Expand Down Expand Up @@ -388,11 +392,15 @@ <h2 id="M">M</h2>
<h2 id="N">N</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier">NaiveCharacterUnifier (class in renard.pipeline.character_unification)</a>
</li>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.names_are_related_after_title_removal">names_are_related_after_title_removal() (renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.needs">needs() (renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>

<ul>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier.needs">(renard.pipeline.character_unification.NaiveCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.PipelineStep.needs">(renard.pipeline.core.PipelineStep method)</a>
</li>
<li><a href="reference.html#renard.pipeline.corefs.BertCoreferenceResolver.needs">(renard.pipeline.corefs.BertCoreferenceResolver method)</a>
Expand Down Expand Up @@ -439,6 +447,8 @@ <h2 id="O">O</h2>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.optional_needs">optional_needs() (renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>

<ul>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier.optional_needs">(renard.pipeline.character_unification.NaiveCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.PipelineStep.optional_needs">(renard.pipeline.core.PipelineStep method)</a>
</li>
<li><a href="reference.html#renard.pipeline.corefs.SpacyCorefereeCoreferenceResolver.optional_needs">(renard.pipeline.corefs.SpacyCorefereeCoreferenceResolver method)</a>
Expand Down Expand Up @@ -469,6 +479,8 @@ <h2 id="P">P</h2>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.production">production() (renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>

<ul>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier.production">(renard.pipeline.character_unification.NaiveCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.PipelineStep.production">(renard.pipeline.core.PipelineStep method)</a>
</li>
<li><a href="reference.html#renard.pipeline.corefs.BertCoreferenceResolver.production">(renard.pipeline.corefs.BertCoreferenceResolver method)</a>
Expand Down Expand Up @@ -567,6 +579,8 @@ <h2 id="S">S</h2>
<li><a href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier.supported_langs">supported_langs() (renard.pipeline.character_unification.GraphRulesCharacterUnifier method)</a>

<ul>
<li><a href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier.supported_langs">(renard.pipeline.character_unification.NaiveCharacterUnifier method)</a>
</li>
<li><a href="reference.html#renard.pipeline.core.PipelineStep.supported_langs">(renard.pipeline.core.PipelineStep method)</a>
</li>
<li><a href="reference.html#renard.pipeline.corefs.BertCoreferenceResolver.supported_langs">(renard.pipeline.corefs.BertCoreferenceResolver method)</a>
Expand Down
1 change: 1 addition & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ <h1>Welcome to Renard’s documentation!<a class="headerlink" href="#welcome-to-
<li class="toctree-l2"><a class="reference internal" href="pipeline.html#pipeline-output-the-pipeline-state">Pipeline Output: the Pipeline State</a></li>
<li class="toctree-l2"><a class="reference internal" href="pipeline.html#available-steps-an-overview">Available Steps: An Overview</a></li>
<li class="toctree-l2"><a class="reference internal" href="pipeline.html#dynamic-graphs">Dynamic Graphs</a></li>
<li class="toctree-l2"><a class="reference internal" href="pipeline.html#multilingual-support">Multilingual Support</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="extending.html">Extending Renard</a><ul>
Expand Down
Binary file modified objects.inv
Binary file not shown.
47 changes: 45 additions & 2 deletions pipeline.html
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#dynamic-graphs">Dynamic Graphs</a></li>
<li class="toctree-l2"><a class="reference internal" href="#multilingual-support">Multilingual Support</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="extending.html">Extending Renard</a></li>
Expand Down Expand Up @@ -247,7 +248,7 @@ <h3>Characters Extraction<a class="headerlink" href="#characters-extraction" tit
occurences detected using NER. This is done by assigning each mention
to a unique character.</p>
<ul class="simple">
<li><p><code class="xref py py-class docutils literal notranslate"><span class="pre">NaiveCharacterUnifier</span></code></p></li>
<li><p><a class="reference internal" href="reference.html#renard.pipeline.character_unification.NaiveCharacterUnifier" title="renard.pipeline.character_unification.NaiveCharacterUnifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">NaiveCharacterUnifier</span></code></a></p></li>
<li><p><a class="reference internal" href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier" title="renard.pipeline.character_unification.GraphRulesCharacterUnifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">GraphRulesCharacterUnifier</span></code></a></p></li>
</ul>
</section>
Expand Down Expand Up @@ -285,7 +286,7 @@ <h2>Dynamic Graphs<a class="headerlink" href="#dynamic-graphs" title="Permalink
<span class="n">NLTKNamedEntityRecognizer</span><span class="p">(),</span>
<span class="n">GraphRulesCharacterUnifier</span><span class="p">(</span><span class="n">min_appearances</span><span class="o">=</span><span class="mi">10</span><span class="p">),</span>
<span class="n">CoOccurrencesGraphExtractor</span><span class="p">(</span>
<span class="n">co_occurences_dist</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span>
<span class="n">co_occurrences_dist</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span>
<span class="n">dynamic</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="c1"># note the &#39;dynamic&#39;</span>
<span class="n">dynamic_window</span><span class="o">=</span><span class="mi">20</span> <span class="c1"># and the &#39;dynamic_window&#39; argument</span>
<span class="p">)</span>
Expand All @@ -309,6 +310,48 @@ <h2>Dynamic Graphs<a class="headerlink" href="#dynamic-graphs" title="Permalink
<a class="reference internal" href="reference.html#renard.pipeline.core.PipelineState.export_graph_to_gexf" title="renard.pipeline.core.PipelineState.export_graph_to_gexf"><code class="xref py py-meth docutils literal notranslate"><span class="pre">PipelineState.export_graph_to_gexf()</span></code></a> correctly exports the
dynamic graph to the Gephi format.</p>
</section>
<section id="multilingual-support">
<h2>Multilingual Support<a class="headerlink" href="#multilingual-support" title="Permalink to this headline"></a></h2>
<p>Renard supports multiple languages. By default, a <a class="reference internal" href="reference.html#renard.pipeline.core.Pipeline" title="renard.pipeline.core.Pipeline"><code class="xref py py-class docutils literal notranslate"><span class="pre">Pipeline</span></code></a>
is configured for English, but can create a pipeline for any language
<em>as long as all of its steps support it</em>. To configure a pipeline for
another language, you can pass the ISO 639-3 code of the language you
want:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">renard.pipeline</span> <span class="kn">import</span> <span class="n">Pipeline</span>
<span class="kn">from</span> <span class="nn">renard.pipeline.tokenization</span> <span class="kn">import</span> <span class="n">NLTKTokenizer</span>
<span class="kn">from</span> <span class="nn">renard.pipeline.ner</span> <span class="kn">import</span> <span class="n">BertNamedEntityRecognizer</span>
<span class="kn">from</span> <span class="nn">renard.pipeline.character_unification</span> <span class="kn">import</span> <span class="n">GraphRulesCharacterUnifier</span>
<span class="kn">from</span> <span class="nn">renard.pipeline.graph_extraction</span> <span class="kn">import</span> <span class="n">CoOccurrencesGraphExtractor</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;./my_doc_in_french.txt&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>

<span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">(</span>
<span class="p">[</span>
<span class="n">NLTKTokenizer</span><span class="p">(),</span>
<span class="n">BertNamedEntityRecognizer</span><span class="p">(),</span>
<span class="n">GraphRulesCharacterUnifier</span><span class="p">(</span><span class="n">min_appearances</span><span class="o">=</span><span class="mi">10</span><span class="p">),</span>
<span class="n">CoOccurrencesGraphExtractor</span><span class="p">(</span><span class="n">co_occurrences_dist</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span>
<span class="p">],</span>
<span class="n">lang</span><span class="o">=</span><span class="s2">&quot;fra&quot;</span> <span class="c1"># ISO 639-3 language code for french</span>
<span class="p">)</span>

<span class="n">out</span> <span class="o">=</span> <span class="n">pipeline</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</pre></div>
</div>
<p>This pipeline is valid because <a class="reference internal" href="reference.html#renard.pipeline.tokenization.NLTKTokenizer" title="renard.pipeline.tokenization.NLTKTokenizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">NLTKTokenizer</span></code></a>,
<a class="reference internal" href="reference.html#renard.pipeline.ner.BertNamedEntityRecognizer" title="renard.pipeline.ner.BertNamedEntityRecognizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">BertNamedEntityRecognizer</span></code></a> and
<a class="reference internal" href="reference.html#renard.pipeline.character_unification.GraphRulesCharacterUnifier" title="renard.pipeline.character_unification.GraphRulesCharacterUnifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">GraphRulesCharacterUnifier</span></code></a> all support french, and that
<code class="xref py py-class docutils literal notranslate"><span class="pre">CoOccurencesGraphExtractor</span></code> works for any language. If that
pipeline was invalid, Renard would display an error message explaining
why. Renard can perform this language check because each step
explicitely indicates which languages it supports by overriding the
<a class="reference internal" href="reference.html#renard.pipeline.core.PipelineStep.supported_langs" title="renard.pipeline.core.PipelineStep.supported_langs"><code class="xref py py-meth docutils literal notranslate"><span class="pre">PipelineStep.supported_langs()</span></code></a> method. This method returns the
sets of languages supported by a step as ISO 639-3 codes. The special
string <code class="docutils literal notranslate"><span class="pre">&quot;any&quot;</span></code> is used to indicate that the step works regardless of
language. If the method is not overrided, the default is english
support only.</p>
</section>
</section>


Expand Down
Loading

0 comments on commit 91b325d

Please sign in to comment.