From 1ffe19c6e66f01d620d5fc1615b68859d3f500d9 Mon Sep 17 00:00:00 2001
From: Aethor
Date: Thu, 18 Jul 2024 17:26:17 +0000
Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20CompNet/?=
=?UTF-8?q?Renard@bd3d6d3e50e2105461386e34c8aa3e9c15e6329d=20=F0=9F=9A=80?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
_sources/contributing.rst.txt | 7 +-
_sources/extending.rst.txt | 6 +-
_sources/pipeline.rst.txt | 47 +++-
contributing.html | 7 +-
extending.html | 6 +-
genindex.html | 30 ++-
objects.inv | Bin 1832 -> 1896 bytes
pipeline.html | 46 +++-
reference.html | 408 +++++++++++++++++++++++-----------
searchindex.js | 2 +-
10 files changed, 413 insertions(+), 146 deletions(-)
diff --git a/_sources/contributing.rst.txt b/_sources/contributing.rst.txt
index b365357..940d906 100644
--- a/_sources/contributing.rst.txt
+++ b/_sources/contributing.rst.txt
@@ -36,4 +36,9 @@ the ``tests`` directory. We use ``pytest`` to test code, and also use
``hypothesis`` when applicable. If you open a patch, make sure that
all tests are passing. In particular, do not rely on the CI, as it
does not run time costly tests! Check for yourself locally, using
-``RENARD_TEST_ALL=1 python -m pytest tests``
+``RENARD_TEST_ALL=1 python -m pytest tests``. Note that there are
+specific tests and environment variable for optional dependencies such
+as *stanza* (``RENARD_TEST_STANZA_OPTDEP``). These must be explicitely
+set to ``1`` if you want to test optional dependencies, as
+``RENARD_TEST_ALL=1`` does not enable test on these optional
+dependencies.
diff --git a/_sources/extending.rst.txt b/_sources/extending.rst.txt
index b52edb8..2431c47 100644
--- a/_sources/extending.rst.txt
+++ b/_sources/extending.rst.txt
@@ -8,8 +8,10 @@ Creating new steps
Usually, steps must implement at least four functions :
-- :meth:`.PipelineStep.__init__`: is used to pass options at step init time
-- :meth:`.PipelineStep.__call__`: is called at pipeline run time
+- :meth:`.PipelineStep.__init__`: is used to pass options at step init
+ time. Options passed at step init time should be valid for a
+ collection of texts, and not be text specific.
+- :meth:`.PipelineStep.__call__`: is called at pipeline run time.
- :meth:`.PipelineStep.needs`: declares the set of informations needed
from the pipeline state by this step. Each returned string should be
an attribute of :class:`.PipelineState`.
diff --git a/_sources/pipeline.rst.txt b/_sources/pipeline.rst.txt
index daeb95e..74af798 100644
--- a/_sources/pipeline.rst.txt
+++ b/_sources/pipeline.rst.txt
@@ -68,7 +68,7 @@ In that case, the ``tokens`` requirements is fulfilled at run time. If
you don't pass the parameter, Renard will throw the following
exception:
->>> ValueError: ["step 1 (NLTKNamedEntityRecognizer) has unsatisfied needs (needs : {'tokens'}, available : {'text'})"]
+>>> ValueError: ["step 1 (NLTKNamedEntityRecognizer) has unsatisfied needs. needs: {'tokens'}. available: {'text'}). missing: {'tokens'}."]
For simplicity, one can use one of the preconfigured pipelines:
@@ -252,6 +252,51 @@ graph to a directory. Meanwhile,
dynamic graph to the Gephi format.
+Custom Segmentation
+-------------------
+
+The ``dynamic_window`` parameter of
+:class:`.CoOccurencesGraphExtractor` determines the segmentation of
+the dynamic networks, in number of interactions. In the example above,
+a new graph will be created for each 20 interactions.
+
+While one can rely on the arguments of the graph extractor of the
+pipeline to determine the dynamic window, Renard allows to specify a
+custom segmentation of a text with the ``dynamic_blocks``
+argument. When running a pipeline, you can cut your text however you
+want and pass this argument instead of the usual text:
+
+
+.. code-block:: python
+
+ from renard.pipeline import Pipeline
+ from renard.pipeline.tokenization import NLTKTokenizer
+ from renard.pipeline.ner import NLTKNamedEntityRecognizer
+ from renard.pipeline.character_unification import GraphRulesCharacterUnifier
+ from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor
+ from renard.utils import block_bounds
+
+ with open("./my_doc.txt") as f:
+ text = f.read()
+
+ # let's suppose the 'cut_into_chapters' function cut the text into chapters.
+ chapters = cut_into_chapters(text)
+
+ pipeline = Pipeline(
+ [
+ NLTKTokenizer(),
+ NLTKNamedEntityRecognizer(),
+ GraphRulesCharacterUnifier(),
+ CoOccurrencesGraphExtractor(co_occurrences_dist=25, dynamic=True)
+ ]
+ )
+
+ # the 'block_bounds' function automatically extracts the boundaries of your
+ # block of text.
+ out = pipeline(text, dynamic_blocks=block_bounds(chapters))
+
+
+
Multilingual Support
====================
diff --git a/contributing.html b/contributing.html
index e637da4..33f032c 100644
--- a/contributing.html
+++ b/contributing.html
@@ -108,7 +108,12 @@ Code Quality Guidelines
+RENARD_TEST_ALL=1 python -m pytest tests
. Note that there are
+specific tests and environment variable for optional dependencies such
+as stanza (RENARD_TEST_STANZA_OPTDEP
). These must be explicitely
+set to 1
if you want to test optional dependencies, as
+RENARD_TEST_ALL=1
does not enable test on these optional
+dependencies.
diff --git a/extending.html b/extending.html
index 1431c68..d23d424 100644
--- a/extending.html
+++ b/extending.html
@@ -84,8 +84,10 @@ Extending Renard
Usually, steps must implement at least four functions :
- |
+ |
@@ -284,15 +290,15 @@ B
C
|
|
@@ -492,6 +500,8 @@ P
- Pipeline (class in renard.pipeline.core)
+
+ - PipelineParameter (renard.pipeline.core.Pipeline attribute)
- PipelineState (class in renard.pipeline.core)
diff --git a/objects.inv b/objects.inv
index 26ef636df157117ebe64626835d8bdc66ca46e1b..c275dd82fb94b524c449ec4eb1d5b8f0d9b65444 100644
GIT binary patch
delta 1801
zcmV+k2ln`=4(JY$bbq~@&vxS`6vp>{3iq_z=%llpU6V6`>`~svYwxm_fb6pBP|9|-<>kOpWbGj{aM<7~|
zHB_520_Z+Nu?9t`*0NmI8dL=8YpFNHbP@?+bT4X+p>U9xA?llD@2IA?f|)TA@>e0B
z#53?)MwKd8WlpXBC)2ANs`Jeq&qY!4wUx?k6~CTgBr-yB$`~QdWr5y^Vp6e^ml`2<
zfA?th6zI_3WPfOhS2r`kTwM7L2;~K#tXy-#=n@#hKG|knxc-op6kC!D&e)sl>o*7#
z6SNorWLXLX#C`&x~5KW
za=nrpX0!a7{DK#5$R<^6mU==Z7<-%e0VasAY;zO!m@V6iFO*H+3xGmRt|BU1gEI1c
zI+BgYwyDT2o9QVEUxy1*1u
z=5?jivw!7vmDq_QBxE0LPUyNBG7}D(VpFPP1Tud*h9IZIF$9?zS;#~CALiHW+V&qF
z%xIe)xG2?csxd@#EP6qS_<08k7o3=ll_@NU(om^L-f?_th>=~)H;_LQlPQ!%97uw!
zd9(eC;w(%?9c8Ho5Hn3*f`-W&w^~G{*sHu5t+L9Q0QqpA%zM
zBAzK)C1Q1lj7ui+=S0lI1B?W;on1Z;PeFI91m0PLCi&sXNF=XfLJw!N$zb-1S0#g_
zP-Uq_nc}iP*FvWnX=RLK8>PmPN-!!DT2fU&pOZ~z4LhBPWJ6(sUO^SNkU^Cun`uc&
zmVX{girgig0C78%6HMfe=Y$uruQ_2Q*uAV&b^}u1ioX6MD8asj@9almQI;FU;{7*<
zjW^>69qptMW}K}?z-c_TsA#_L@Q}=H7qPLv^AU)0poc)-eM@Xwv~{~|GAt+4s3w^K
z9y-caAzJQD;3t~Z1_&^pU6Pd%jqQy-ZQPllTsiVVrcFOQDs*)@|jHsk?*MEov
z-d^0SvUW|or&}oQ?2+025puD%k2Zx4@hA&Dq?f3;A-q71OA5!~OZ)uW!NX$d$S9|c
zBPViJY^1XA`@2VnMcA}&i@eqGl7YJ=`I3US@*xFxW8;{jHs#6;P8>wjHR2SoVSIS{
zBUMB?%ZiFfW7LRYJWEAS?eoaEsegT*8#%Qr>tA7V8`VDU?Ehr->x}HSR$6Qq^-`Ih
z*7kxQ`tO7Dm=HADVSyqz!v%?;{1>Evw_}h@q&u5CO{4$~E-m}Z{{OD|H`q#}EVOqn
zQE_G&r=r|3j>OwxoEtg*_I6%rk0uEY)uJ{*Q8op5ZV$rS$5P!fK#9IL6
z6>mD;Htp@y{+%Ri9h<){Tsd-%ds6Y^>di+Et1mxp$K)%6Gmj@c!aJJsm5@bq3xl%?
znYO&dWQ(@Ip6%X$ctAcg=!?ah&|d)QLSF%#KYZ!w43f2fO*rF1H-|dPzN|&eo(b{F
zRr@kp`v9;Bj8E;`X`$ijJbwsnj!`7e4r9o;I5a{=S!;xcH`53dncdiDvJh{w4(;4W
zzH1*BlR%Z=bjkMKUY4LlElrZ1ZT6Kg)1y>dmHpO#eow!F*5K=CZ(OJe1HiN>$WST0
zz`@H)$=UV}JTLR-{*CD5^h%ih+L8f@m+*GX*;}jr?(4gPm@jUs+<(|RgEyk)E}Ym`
ze@Cy`wQ{G%B`;t{6fl#qcu%bsU}ed&*9~ZDpHN*c-80V1Wng0kRGOYwCi?=(LmO9=
zd-OIJyiM)+daiWwwh*};?8EY=_WbssWClA8ym-;3FLVv=`Z|wI$7X92YV9aAZJJYj
z+o~&djefTuXG66XeSc8n6R=!tVXj5X{qLWD)!fvp?RKUK?LQcR$s6qJ^!CD!I74|S
zIFy)XdRSYInhoV$v!P_RwJ+qKTv)Zg>b`JoK03R*OYGL9Lz#Y^YN|hNVBRHAvjf-#
rm505F1^ig7ANWJHV>g%|v;BAGe0MD4%3RHb`~HEhVfH^z`vT)5$-{#t
delta 1736
zcmV;(1~>WW4yX>0bbrO0&z9RJ6vp>{3iq_z(Mhw+u9;5Lo+i`j%=Bc*!P2$O84L=b
z#LlX((bwyf6of4Qm*Ilam2C{aPZA%O3z8HFs;Xk4XuTwjVO&q^&6}018V=J}Qw4KS
zAWA5fX2rkvixpfFw3H!FUohTckU`$(!^q0S!JQa^hEHQq
zmsBsu;6_gr1
z*__8u7-LK6rmm%Oy~n9oWIDBbYE$M>V#rx?=J0VzBWL6|uaVJR=LN>_@p+}(7Cx_%
z>Rzd!_iM+$SAOo>nLis}MT+6C7kOI|+b^(>dd9pcaQvbJrlreP-+KsG!yCi{@
z!BtyShJof$HfGjs?&a3hI?tJQ%zU&istYh*qfJy)6F}bNnP#g2a2}66)wJAf@sQ1JSFx$S
z3lT_iV1Pim{g=43Xyq5gdr9>;s#!)rfR1)mh*2BjH;dhG81KAN#@=;$`eYF#lRVf%
zBmd0_G;$8Pb_5Y|A^2FXGf$7IdwOAQ+<%gzy`d%lD9c8*?R2|Sedqs_v7sskjSg3l
zBDjzhC(_j;2Tz)O8%h5x%l{!fzEl_gwe3T@HnxSM%Aw*|ltNm_jFQ5K%_y0~jOmP)
zut7^xf1Ps2qRZ@@!Dvt=TQnL_$>pvA3B0|8ozvY6dQVqS-MJ0A%_Ed@;T}Ya?SJAy
z7P?EXQE^>(g_@Rv4#Q{eX|=sayYi7yP8&v!2>sj~VQo!3W
zN+!{r?R_m$fdQYE{po)H-2RNV(jW`%ooiH_S%#@7cMK!(b{OU+_P@J%$lOAhP@9Ht
zG}CP)eA*WOn<9~rve7^?s
zk=+`~Pg^mC%HYfs2v6{ip<*so)!r}>yhf%i$C%=&D{wo>_wOH&4-^KElHp-^b#d1?iYhtEXy7;yK
z{*HbHV<6Ob!MIqfM1W~kkzJ+y0(&n6WoNs)@`9Bw{aeP_>9w@0su>58E)ndQi?>_D
z-Pa!qV!ZgF;&*A8s9;SraPW7#f(D5PXrRgxYMXU{60wbIDgx$#Qh$mrr4fs%Hr1O-
zmhSTHm$$X2w|gZMSQ`-4QJcQdH@NHTJhlfmS=iv^u14FY-Mx0Lwxzz&Z|>`4SFJV5<`yV~E^%|GPx54r|3$XL0d+FPq_yK2Ez7^b+*k)!Itq+>*%D2sS
zC6kqVZT#fK>dnpe4?*ANgEx>r#ch<|m+2#aTm5+n({17~*@InCdDz&|B@V^Graph Extraction
-Dynamic Graphs
+Dynamic Graphs
+
Multilingual Support
@@ -152,7 +155,7 @@ The Pipeline |