Nick/link check (#359)

* Internal: Try new link checker * Internal: Add codespell and fix typos. * Internal: See if codespell precommit finds config. * Internal: Found config. Now enable reading it Closes #358
sandialabs · Jan 6, 2025 · c3249ad · c3249ad
1 parent 589239b
commit c3249ad
Show file tree

Hide file tree

Showing 31 changed files with 114 additions and 64 deletions.
diff --git a/.github/workflows/markdown-check.yml b/.github/workflows/markdown-check.yml
@@ -7,8 +7,14 @@ on:
     branches: [ "main" ]
 
 jobs:
-  markdown-link-check:
+  check-links:
+    name: runner / linkspector
     runs-on: ubuntu-latest
     steps:
-    - uses: actions/checkout@v4
-    - uses: gaurav-nelson/github-action-markdown-link-check@v1
+      - uses: actions/checkout@v4
+      - name: Run linkspector
+        uses: umbrelladocs/action-linkspector@v1
+        with:
+          github_token: ${{ secrets.github_token }}
+          reporter: github-pr-review
+          fail_on_error: true
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -13,3 +13,10 @@ repos:
             --extra-keys=metadata.language_info metadata.vscode metadata.kernelspec cell.metadata.vscode,
             --drop-empty-cells
         ]
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.3.0
+    hooks:
+      - id: codespell
+        args: [ --toml, "pyproject.toml"]
+        additional_dependencies:
+          - tomli
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,7 +12,7 @@
   - Aligning comparison operator output for data classes (https://github.com/sandialabs/pyttb/pull/331)
 - Improved:
   - Getting starting documentation (https://github.com/sandialabs/pyttb/pull/324)
-  - Development enviroment (https://github.com/sandialabs/pyttb/pull/329, https://github.com/sandialabs/pyttb/pull/330)
+  - Development environment (https://github.com/sandialabs/pyttb/pull/329, https://github.com/sandialabs/pyttb/pull/330)
   - Documentation (https://github.com/sandialabs/pyttb/pull/328, https://github.com/sandialabs/pyttb/pull/334)
 
 # v1.8.0 (2024-10-23)
@@ -93,7 +93,7 @@
     - Addresses ambiguity of -0 by using `exclude_dims` (`numpy.ndarray`) parameter
   - `ktensor.ttv`, `sptensor.ttv`, `tensor.ttv`, `ttensor.ttv`
     - Use `exlude_dims` parameter instead of `-dims`
-    - Explicit nameing of dimensions to exclude
+    - Explicit naming of dimensions to exclude
   - `tensor.ttsv`
     - Use `skip_dim` (`int`) parameter instead of `-dims`
     - Exclude all dimensions up to and including `skip_dim`

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -35,19 +35,25 @@ current or filing a new [issue](https://github.com/sandialabs/pyttb/issues).
     ```
     git checkout -b my-new-feature-branch
     ```
-1. Formatters and linting
+1. Formatters and linting (These are checked in the full test suite as well)
    1. Run autoformatters and linting from root of project (they will change your code)
-       ```commandline
-       ruff check . --fix
-       ruff format
-       ```
+      ```commandline
+      ruff check . --fix
+      ruff format
+      ```
       1. Ruff's `--fix` won't necessarily address everything and may point out issues that need manual attention
       1. [We](./.pre-commit-config.yaml) optionally support [pre-commit hooks](https://pre-commit.com/) for this
          1. Alternatively, you can run `pre-commit run --all-files` from the command line if you don't want to install the hooks.
    1. Check typing
       ```commandline
       mypy pyttb/
       ```
+      1. Not included in our pre-commit hooks because of slow runtime.
+   1. Check spelling
+      ```commandline
+      codespell
+      ```
+      1. This is also included in the optional pre-commit hooks.
 
 1. Run tests (at desired fidelity)
    1. Just doctests (enabled by default)

diff --git a/README.md b/README.md
@@ -32,7 +32,7 @@ low-rank tensor decompositions:
 [`cp_apr`](https://pyttb.readthedocs.io/en/stable/cpapr.html "CP decomposition via Alternating Poisson Regression"), 
 [`gcp_opt`](https://pyttb.readthedocs.io/en/stable/gcpopt.html "Generalized CP decomposition"), 
 [`hosvd`](https://pyttb.readthedocs.io/en/stable/hosvd.html "Tucker decomposition via Higher Order Singular Value Decomposition"),
-[`tucker_als`](https://pyttb.readthedocs.io/en/stable/tuckerals.html "Tucker decompostion via Alternating Least Squares")
+[`tucker_als`](https://pyttb.readthedocs.io/en/stable/tuckerals.html "Tucker decomposition via Alternating Least Squares")
 
 ## Quick Start
 

diff --git a/docs/source/tutorial/algorithm_cp_als.ipynb b/docs/source/tutorial/algorithm_cp_als.ipynb
@@ -122,7 +122,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Increase the maximium number of iterations\n",
+    "## Increase the maximum number of iterations\n",
     "Note that the previous run kicked out at only 10 iterations, before reaching the specified convegence tolerance. Let's increase the maximum number of iterations and try again, using the same initial guess."
    ]
   },
@@ -337,7 +337,7 @@
    "source": [
     "## Recommendations\n",
     "* Run multiple times with different guesses and select the solution with the best fit.\n",
-    "* Try different ranks and choose the solution that is the best descriptor for your data based on the combination of the fit and the interpretaton of the factors, e.g., by visualizing the results."
+    "* Try different ranks and choose the solution that is the best descriptor for your data based on the combination of the fit and the interpretation of the factors, e.g., by visualizing the results."
    ]
   }
  ],

diff --git a/docs/source/tutorial/algorithm_gcp_opt.ipynb b/docs/source/tutorial/algorithm_gcp_opt.ipynb
@@ -19,7 +19,7 @@
     "tags": []
    },
    "source": [
-    "This document outlines usage and examples for the generalized CP (GCP) tensor decomposition implmented in `pyttb.gcp_opt`. GCP allows alternate objective functions besides sum of squared errors, which is the standard for CP. The code support both dense and sparse input tensors, but the sparse input tensors require randomized optimization methods.\n",
+    "This document outlines usage and examples for the generalized CP (GCP) tensor decomposition implemented in `pyttb.gcp_opt`. GCP allows alternate objective functions besides sum of squared errors, which is the standard for CP. The code support both dense and sparse input tensors, but the sparse input tensors require randomized optimization methods.\n",
     "\n",
     "GCP is described in greater detail in the manuscripts:\n",
     "* D. Hong, T. G. Kolda, J. A. Duersch, Generalized Canonical Polyadic Tensor Decomposition, SIAM Review, 62:133-163, 2020, https://doi.org/10.1137/18M1203626\n",

diff --git a/docs/source/tutorial/algorithm_hosvd.ipynb b/docs/source/tutorial/algorithm_hosvd.ipynb
@@ -94,7 +94,7 @@
    "metadata": {},
    "source": [
     "## Generate a core with different accuracies for different shapes\n",
-    "We will create a core `tensor` that has is nearly block diagonal. The blocks are expontentially decreasing in norm, with the idea that we can pick off one block at a time as we increate the prescribed accuracy of the HOSVD. To do this, we define and use a function `tenrandblk()`."
+    "We will create a core `tensor` that has is nearly block diagonal. The blocks are expontentially decreasing in norm, with the idea that we can pick off one block at a time as we increase the prescribed accuracy of the HOSVD. To do this, we define and use a function `tenrandblk()`."
    ]
   },
   {

diff --git a/docs/source/tutorial/class_sptensor.ipynb b/docs/source/tutorial/class_sptensor.ipynb
@@ -17,7 +17,7 @@
    "metadata": {},
    "source": [
     "## Creating a `sptensor`\n",
-    "The `sptensor` class stores the data in coordinate format. A sparse `sptensor` can be created by passing in a list of subscripts and values. For example, here we pass in three subscripts and a scalar value. The resuling sparse `sptensor` has three nonzero entries, and the `shape` is the size of the largest subscript in each dimension."
+    "The `sptensor` class stores the data in coordinate format. A sparse `sptensor` can be created by passing in a list of subscripts and values. For example, here we pass in three subscripts and a scalar value. The resulting sparse `sptensor` has three nonzero entries, and the `shape` is the size of the largest subscript in each dimension."
    ]
   },
   {

diff --git a/docs/source/tutorial/class_sumtensor.ipynb b/docs/source/tutorial/class_sumtensor.ipynb
@@ -54,7 +54,7 @@
    "metadata": {},
    "source": [
     "## Creating sumtensors\n",
-    "A sumtensor `T` can only be delared as a sum of same-shaped tensors T1, T2,...,TN. The summand tensors are stored internally, which define the \"parts\" of the `sumtensor`. The parts of a `sumtensor` can be (dense) tensors (`tensor`), sparse tensors (` sptensor`), Kruskal tensors (`ktensor`), or Tucker tensors (`ttensor`). An example of the use of the sumtensor constructor follows."
+    "A sumtensor `T` can only be declared as a sum of same-shaped tensors T1, T2,...,TN. The summand tensors are stored internally, which define the \"parts\" of the `sumtensor`. The parts of a `sumtensor` can be (dense) tensors (`tensor`), sparse tensors (` sptensor`), Kruskal tensors (`ktensor`), or Tucker tensors (`ttensor`). An example of the use of the sumtensor constructor follows."
    ]
   },
   {

diff --git a/docs/source/tutorial/class_tenmat.ipynb b/docs/source/tutorial/class_tenmat.ipynb
@@ -16,7 +16,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We show how to convert a `tensor` to a 2D numpy array stored with extra information so that it can be converted back to a `tensor`. Converting to a 2D numpy array requies an ordered mapping of the `tensor` indices to the rows and the columns of the 2D numpy array."
+    "We show how to convert a `tensor` to a 2D numpy array stored with extra information so that it can be converted back to a `tensor`. Converting to a 2D numpy array requires an ordered mapping of the `tensor` indices to the rows and the columns of the 2D numpy array."
    ]
   },
   {

diff --git a/docs/source/tutorial/class_tensor.ipynb b/docs/source/tutorial/class_tensor.ipynb
@@ -107,7 +107,7 @@
    "metadata": {},
    "source": [
     "## Specifying trailing singleton dimensions in a `tensor`\n",
-    "Likewise, trailing singleton dimensions must be explictly specified."
+    "Likewise, trailing singleton dimensions must be explicitly specified."
    ]
   },
   {
@@ -136,7 +136,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## The constitutent parts of a `tensor`"
+    "## The constituent parts of a `tensor`"
    ]
   },
   {

diff --git a/docs/source/tutorial/class_ttensor.ipynb b/docs/source/tutorial/class_ttensor.ipynb
@@ -630,7 +630,7 @@
    "metadata": {},
    "source": [
     "### Compare visualizations\n",
-    "We can compare the results of reconstruction. There is no degredation in doing only a partial reconstruction. Downsampling is obviously lower resolution, but the same result as first doing the full reconstruction and then downsampling."
+    "We can compare the results of reconstruction. There is no degradation in doing only a partial reconstruction. Downsampling is obviously lower resolution, but the same result as first doing the full reconstruction and then downsampling."
    ]
   },
   {

diff --git a/profiling/algorithms_profiling.ipynb b/profiling/algorithms_profiling.ipynb
@@ -90,7 +90,7 @@
     "    label:\n",
     "        The user-supplied label to distinguish a test run.\n",
     "    params:\n",
-    "        Paramters passed to the algorithm function.\n",
+    "        Parameters passed to the algorithm function.\n",
     "        'rank' may be given to the CP algorithms; 'tol' and 'verbosity' to hosvd.\n",
     "    \"\"\"\n",
     "\n",
@@ -108,7 +108,7 @@
     "        # stop collecting data, and send data to Stats object and sort\n",
     "        profiler.disable()\n",
     "\n",
-    "        # save profiling ouput to sub-directory specific to the function being tested.\n",
+    "        # save profiling output to sub-directory specific to the function being tested.\n",
     "        output_directory = f\"./pstats_files/{algorithm_name}\"\n",
     "        if not os.path.exists(output_directory):\n",
     "            os.makedirs(output_directory)  # create directory if it doesn't exist\n",
@@ -155,7 +155,7 @@
     "    label:\n",
     "        The user-supplied label to distinguish a test run. This will be used in the output file name.\n",
     "    params:\n",
-    "        Paramters passed to the algorithm function.\n",
+    "        Parameters passed to the algorithm function.\n",
     "        'rank' may be given to the CP algorithms; 'tol' and 'verbosity' to hosvd.\n",
     "    \"\"\"\n",
     "\n",
@@ -410,7 +410,7 @@
    "source": [
     "### Generating all algorithms' profiling images\n",
     " \n",
-    "The cell bellow will generate all profiling images for all algorithms in `./gprof2dot_images/<specific_algorithm>`"
+    "The cell below will generate all profiling images for all algorithms in `./gprof2dot_images/<specific_algorithm>`"
    ]
   },
   {

diff --git a/pyproject.toml b/pyproject.toml
@@ -40,6 +40,7 @@ dev = [
     # Also in pre-commit
     "ruff>=0.7,<0.8",
     "pre-commit>=4.0,<5.0",
+    "codespell>=2.3.0,<2.4.0"
 ]
 doc = [
     "sphinx >= 4.0",
@@ -120,3 +121,22 @@ addopts = "--doctest-modules pyttb"
 filterwarnings = [
     "ignore:.*deprecated.*:"
 ]
+
+[tool.codespell]
+skip = [
+    # Built documentation
+    "./docs/build",
+    "./docs/jupyter_execute",
+    # Project build artifacts
+    "./build"
+]
+count = true
+ignore-words-list = [
+    # Conventions carried from MATLAB ttb (consider changing)
+    "ans",
+    "siz",
+    # Tensor/repo Nomenclature
+    "COO",
+    "nd",
+    "als",
+]
diff --git a/pyttb/cp_als.py b/pyttb/cp_als.py
@@ -76,7 +76,7 @@ def cp_als(  # noqa: PLR0912,PLR0913,PLR0915
 
     Example
     -------
-    Random initialization causes slight pertubation in intermediate results.
+    Random initialization causes slight perturbation in intermediate results.
     `...` is our place holder for these numeric values.
     Example using default values ("random" initialization):
 

diff --git a/pyttb/cp_apr.py b/pyttb/cp_apr.py
@@ -104,7 +104,7 @@ def cp_apr(  # noqa: PLR0913
         assert init.ndims == N, "Initial guess does not have the right number of modes"
         assert (
             init.ncomponents == rank
-        ), "Initial guess does not have the right number of componenets"
+        ), "Initial guess does not have the right number of components"
         for n in range(N):
             if init.shape[n] != input_tensor.shape[n]:
                 assert False, f"Mode {n} of the initial guess is the wrong size"
@@ -256,7 +256,7 @@ def tt_cp_apr_mu(  # noqa: PLR0912,PLR0913,PLR0915
     M.normalize(normtype=1)
     Phi = []  # np.zeros((N,))#cell(N,1)
     for n in range(N):
-        # TODO prepopulation Phi instead of appen should be faster
+        # TODO prepopulation Phi instead of append should be faster
         Phi.append(np.zeros(M.factor_matrices[n].shape))
     kktModeViolations = np.zeros((N,))
 
@@ -488,7 +488,7 @@ def tt_cp_apr_pdnr(  # noqa: PLR0912,PLR0913,PLR0915
 
     if isinstance(input_tensor, ttb.sptensor) and isSparse and precompinds:
         # Precompute sparse index sets for all the row subproblems.
-        # Takes more memory but can cut exectuion time significantly in some cases.
+        # Takes more memory but can cut execution time significantly in some cases.
         if printitn > 0:
             print("\tPrecomuting sparse index sets...")
         sparseIx = []
@@ -847,7 +847,7 @@ def tt_cp_apr_pqnr(  # noqa: PLR0912,PLR0913,PLR0915
 
     if isinstance(input_tensor, ttb.sptensor) and precompinds:
         # Precompute sparse index sets for all the row subproblems.
-        # Takes more memory but can cut exectuion time significantly in some cases.
+        # Takes more memory but can cut execution time significantly in some cases.
         if printitn > 0:
             print("\tPrecomuting sparse index sets...")
         sparseIx = []
@@ -989,12 +989,12 @@ def tt_cp_apr_pqnr(  # noqa: PLR0912,PLR0913,PLR0915
                         delg[:, lbfgsPos] = tmp_delg
                         rho[lbfgsPos] = tmp_rho
                     else:
-                        # Rho is required to be postive; if not, then skip the L-BFGS
+                        # Rho is required to be positive; if not, then skip the L-BFGS
                         # update pair. The recommended safeguard for full BFGS is
                         # Powell damping, but not clear how to damp in 2-loop L-BFGS
                         if dispLineWarn:
                             warnings.warn(
-                                "WARNING: skipping L-BFGS update, rho whould be "
+                                "WARNING: skipping L-BFGS update, rho would be "
                                 f"1 / {tmp_delm * tmp_delg}"
                             )
                         # Roll back lbfgsPos since it will increment later.
@@ -1384,7 +1384,7 @@ def tt_linesearch_prowsubprob(  # noqa: PLR0913
     max_steps:
         maximum number of steps to try (suggest 10)
     suff_decr:
-        sufficent decrease for convergence (suggest 1.0e-4)
+        sufficient decrease for convergence (suggest 1.0e-4)
     isSparse:
         sparsity flag for computing the objective
     data_row:
@@ -1414,7 +1414,7 @@ def tt_linesearch_prowsubprob(  # noqa: PLR0913
 
     stepSize = step_len
 
-    # Evalute the current objective value
+    # Evaluate the current objective value
     f_old = -tt_loglikelihood_row(isSparse, data_row, model_old, Pi)
     num_evals = 1
     count = 1
@@ -1613,7 +1613,7 @@ def get_search_dir_pqnr(  # noqa: PLR0913
     lbfgsSize = delta_model.shape[1]
 
     # Determine active and free variables.
-    # TODO: is the bellow relevant?
+    # TODO: is the below relevant?
     # If epsActSet is zero, then the following works:
     # fixedVars = find((m_row == 0) & (grad' > 0));
     # For the general case this works but is less clear and assumes m_row > 0:
@@ -1747,7 +1747,7 @@ def calculate_phi(  # noqa: PLR0913
     Pi: np.ndarray,
     epsilon: float,
 ) -> np.ndarray:
-    """Calcualte Phi.
+    """Calculate Phi.
 
     Parameters
     ----------

diff --git a/pyttb/gcp/optimizers.py b/pyttb/gcp/optimizers.py
@@ -512,7 +512,7 @@ def lbfgsb_func_grad(vector: np.ndarray):
 
         lbfgsb_info["final_f"] = final_f
         lbfgsb_info["callback"] = vars(monitor)
-        # Unregister monitor in case of re-use
+        # Unregister monitor in case of reuse
         self._solver_kwargs["callback"] = monitor.callback
 
         # TODO big print output

diff --git a/pyttb/hosvd.py b/pyttb/hosvd.py
@@ -116,7 +116,7 @@ def hosvd(  # noqa: PLR0912,PLR0913,PLR0915
             ranks[k] = np.where(eigsum > eigsumthresh)[0][-1]
 
             if verbosity > 5:
-                print("Reverse cummulative sum of evals of Gram matrix:")
+                print("Reverse cumulative sum of evals of Gram matrix:")
                 for i, a_sum in enumerate(eigsum):
                     print_msg = f"{i: d}: {a_sum: 6.4f}"
                     if i == ranks[k]:

diff --git a/pyttb/ktensor.py b/pyttb/ktensor.py
@@ -945,7 +945,7 @@ def to_tenmat(
             Mapping of column indices.
         cdims_cyclic:
             When only rdims is specified maps a single rdim to the rows and
-                the remaining dimensons span the columns. _fc_ (forward cyclic)
+                the remaining dimensions span the columns. _fc_ (forward cyclic)
                 in the order range(rdims,self.ndims()) followed by range(0, rdims).
                 _bc_ (backward cyclic) range(rdims-1, -1, -1) then
                 range(self.ndims(), rdims, -1).
@@ -1378,7 +1378,7 @@ def normalize(
 
         if sort:
             if self.ncomponents > 1:
-                # indices of srting in descending order
+                # indices of string in descending order
                 p = np.argsort(self.weights)[::-1]
                 self.arrange(permutation=p)
 
@@ -2300,7 +2300,7 @@ def viz(  # noqa: PLR0912, PLR0913
         >>> fig, axs = K.viz(show_figure=False)  # doctest: +ELLIPSIS
         >>> plt.close(fig)
 
-        Define a more realistic plot fuctions with x labels,
+        Define a more realistic plot functions with x labels,
         control relative widths of each plot,
         and set mode titles.