Good Morning!
' -translated_markup = translator.translate(markup, target_lang='yo', is_markup=True) +translated_markup = bing.translate(markup, target_lang='yo', is_markup=True) print(translated_markup) # Output:Eku ojumo!
``` -#### Translate BeautifulSoup Objects +#### Translate BeautifulSoup ```python import tranzlate from bs4 import BeautifulSoup -translator = tranzlate.Translator() - +baidu = tranzlate.Translator("baidu") markup = 'Good Morning!
' soup = BeautifulSoup(markup, 'html.parser') -translated_soup = translator.translate(soup, target_lang='yo') +translated_soup = baidu.translate_soup(soup, target_lang='fr') ``` However, there are specialized methods for translating text, markup and BeautifulSoup objects. These methods are `translate_text`, `translate_markup` and `translate_soup` respectively. -### Translate Files +### Translate files -To translate files, we use the `translate_file` method of the `Translator` class. +To translate files, we use the `translate_file` method. ```python import tranzlate -translator = tranzlate.Translator() -translated_file = translator.translate_file('path/to/file.txt', src_lang="en", target_lang='yo') +bing = tranzlate.Translator() # Bing is used by default +translated_file = bing.translate_file('path/to/file.txt', src_lang="en", target_lang='yo') ``` It is advisable to specify the source language when performing translations as it helps the translation engine to provide more accurate translations. -### Use a Proxy +### Use a proxy -To use a proxy, simply pass the proxy to the `Translator` class on instantiation: +To use a proxy, simply pass your proxies on translation: ```python import tranzlate -translator = tranzlate.Translator() +deepl = tranzlate.Translator("deepl") text = 'Good Morning!' -translation = translator.translate(text, target_lang='yo', proxies={'https': 'https://Once upon a time, in a land far, far away, there was a small village nestled among the rolling hills and verdant valleys. The villagers were known for their hard work, kindness, and the vibrant tapestry of stories that wove through their daily lives. One such story was about a wise old woman named Elara who lived on the edge of the village in a quaint little cottage.
+Elara was known throughout the land for her vast knowledge and gentle wisdom. People from neighboring villages would travel great distances to seek her counsel. She had a gift for understanding the language of nature and could often be found tending to her garden, where the flowers bloomed more brightly and the herbs grew more fragrant under her care.
+One day, a young girl named Mira, who lived in the village, found herself in need of guidance. She had come to a crossroads in her life and felt lost and uncertain about which path to take. Her parents suggested she visit Elara, and so, with a hopeful heart, Mira set off towards the old woman's cottage.
+As Mira approached the cottage, she was greeted by the sight of Elara gently coaxing a song from a flute carved from a piece of willow. The music seemed to weave itself into the very air, carrying with it a sense of peace and clarity. Elara looked up and smiled warmly at Mira, inviting her to sit on a bench beneath the shade of an ancient oak tree.
+"Tell me, child, what troubles you?" Elara asked, her voice as soothing as the melody she had just played.
+Mira took a deep breath and poured out her heart to Elara, explaining her fears and doubts about the future. Elara listened patiently, her eyes never leaving Mira's face. When Mira had finished, Elara nodded thoughtfully.
+"Life is much like a garden," Elara began. "It requires patience, care, and the willingness to embrace both the sunshine and the storms. Each path you consider is a seed, full of potential. It is up to you to nurture it, to provide it with the light of your dreams and the water of your efforts. Some seeds will grow into magnificent trees, others into delicate flowers, and some may not grow at all. But each has its place and purpose."
+Mira felt a sense of calm wash over her as she listened to Elara's words. She realized that there was no single right path, but rather, a journey of growth and discovery. With newfound resolve, she thanked Elara and made her way back to the village, ready to plant the seeds of her future with hope and determination.
+Years passed, and Mira grew into a strong, confident woman. She often thought of Elara and the wisdom she had shared. The village continued to thrive, each generation carrying forward the stories and lessons of those who came before. And so, the tapestry of life in that small village remained vibrant and ever-changing, a testament to the enduring power of wisdom, kindness, and the human spirit.
+ + diff --git a/tests/fixtures/text.txt b/tests/fixtures/text.txt new file mode 100644 index 0000000..93fa989 --- /dev/null +++ b/tests/fixtures/text.txt @@ -0,0 +1,17 @@ +Once upon a time, in a land far, far away, there was a small village nestled among the rolling hills and verdant valleys. The villagers were known for their hard work, kindness, and the vibrant tapestry of stories that wove through their daily lives. One such story was about a wise old woman named Elara who lived on the edge of the village in a quaint little cottage. + +Elara was known throughout the land for her vast knowledge and gentle wisdom. People from neighboring villages would travel great distances to seek her counsel. She had a gift for understanding the language of nature and could often be found tending to her garden, where the flowers bloomed more brightly and the herbs grew more fragrant under her care. + +One day, a young girl named Mira, who lived in the village, found herself in need of guidance. She had come to a crossroads in her life and felt lost and uncertain about which path to take. Her parents suggested she visit Elara, and so, with a hopeful heart, Mira set off towards the old woman's cottage. + +As Mira approached the cottage, she was greeted by the sight of Elara gently coaxing a song from a flute carved from a piece of willow. The music seemed to weave itself into the very air, carrying with it a sense of peace and clarity. Elara looked up and smiled warmly at Mira, inviting her to sit on a bench beneath the shade of an ancient oak tree. + +"Tell me, child, what troubles you?" Elara asked, her voice as soothing as the melody she had just played. + +Mira took a deep breath and poured out her heart to Elara, explaining her fears and doubts about the future. Elara listened patiently, her eyes never leaving Mira's face. When Mira had finished, Elara nodded thoughtfully. + +"Life is much like a garden," Elara began. "It requires patience, care, and the willingness to embrace both the sunshine and the storms. Each path you consider is a seed, full of potential. It is up to you to nurture it, to provide it with the light of your dreams and the water of your efforts. Some seeds will grow into magnificent trees, others into delicate flowers, and some may not grow at all. But each has its place and purpose." + +Mira felt a sense of calm wash over her as she listened to Elara's words. She realized that there was no single right path, but rather, a journey of growth and discovery. With newfound resolve, she thanked Elara and made her way back to the village, ready to plant the seeds of her future with hope and determination. + +Years passed, and Mira grew into a strong, confident woman. She often thought of Elara and the wisdom she had shared. The village continued to thrive, each generation carrying forward the stories and lessons of those who came before. And so, the tapestry of life in that small village remained vibrant and ever-changing, a testament to the enduring power of wisdom, kindness, and the human spirit. diff --git a/tests/test_translator.py b/tests/test_translator.py index dfd33fc..2e64b6e 100644 --- a/tests/test_translator.py +++ b/tests/test_translator.py @@ -7,6 +7,7 @@ from tranzlate.exceptions import TranslationError, UnsupportedLanguageError + class TestTranslator(unittest.TestCase): """Test case for the Translator class.""" example_text = "Yoruba is a language spoken in West Africa, most prominently Southwestern Nigeria." @@ -43,8 +44,6 @@ def test_detect_language_with_empty_text(self): def test_translate_on_auto(self): with self.assertRaises(TypeError): self.translator.translate(None) - with self.assertRaises(ValueError): - self.translator.translate("") translation = self.translator.translate(self.example_text, target_lang="yo") self.assertIsInstance(translation, str) self.assertNotEqual(translation, self.example_text) @@ -68,37 +67,33 @@ def test_translate_with_empty_target_language(self): def test_translate_with_unsupported_source_language(self): with self.assertRaises(UnsupportedLanguageError): - self.translator.translate(self.example_text, "en-xy", "yo") + self.translator.translate(self.example_text, "xyz", "yo") def test_translate_with_unsupported_target_language(self): with self.assertRaises(UnsupportedLanguageError): - self.translator.translate(self.example_text, "en", "yo-xy") + self.translator.translate(self.example_text, "en", "xyz") def test_translate_with_the_same_source_and_target_language(self): - with self.assertRaises(TranslationError): + with self.assertRaises(ValueError): self.translator.translate(self.example_text, "en", "en") - def test_is_supported_language(self): - self.assertTrue(self.translator.is_supported_language("en")) - self.assertFalse(self.translator.is_supported_language("en-xy")) + def test_supports_language(self): + self.assertTrue(self.translator.supports_language("en")) + self.assertFalse(self.translator.supports_language("xyz")) with self.assertRaises(TypeError): - self.translator.is_supported_language(None) - with self.assertRaises(ValueError): - self.translator.is_supported_language("") + self.translator.supports_language(None) def test_get_supported_target_languages(self): self.assertIsInstance(self.translator.get_supported_target_languages("en"), list) self.assertTrue(len(self.translator.get_supported_target_languages("en")) > 0) - self.assertIsInstance(self.translator.get_supported_target_languages("en-xy"), list) - self.assertTrue(len(self.translator.get_supported_target_languages("en-xy")) == 0) + self.assertIsInstance(self.translator.get_supported_target_languages("xyz"), list) + self.assertTrue(len(self.translator.get_supported_target_languages("xyz")) == 0) with self.assertRaises(TypeError): self.translator.get_supported_target_languages(None) - with self.assertRaises(ValueError): - self.translator.get_supported_target_languages("") - def test_is_supported_pair(self): - self.assertFalse(self.translator.is_supported_pair("en", "en")) - self.assertTrue(self.translator.is_supported_pair("en", "yo")) + def test_supports_pair(self): + self.assertFalse(self.translator.supports_pair("en", "en")) + self.assertTrue(self.translator.supports_pair("en", "yo")) def test_properties(self): self.assertIsInstance(self.translator.server, TranslatorsServer) @@ -136,4 +131,4 @@ def test_translate_soup(self): if "__name__" == "__main__": unittest.main() -# RUN WITH 'python -m unittest discover tests "test_*.py"' from project's root directory +# Run with 'python -m unittest discover tests "test_*.py"' from project's root directory diff --git a/tranzlate/__init__.py b/tranzlate/__init__.py index 403eea6..9b5a5b2 100644 --- a/tranzlate/__init__.py +++ b/tranzlate/__init__.py @@ -1,13 +1,12 @@ """ #### tranzlate -Simple python package for multi-lingual translation of text, files, markup and BeautifulSoup objects. +Translate text, files, markup and BeautifulSoup. @Author: Daniel T. Afolayan (ti-oluwa.github.io) """ -from .translator import Translator +from .translator import Translator, add_translatable_html_tag -__all__ = ["Translator"] -__version__ = "0.0.1" -__author__ = "Daniel T. Afolayan" +__all__ = ["Translator", "add_translatable_html_tag"] +__version__ = "0.0.2" diff --git a/tranzlate/translator.py b/tranzlate/translator.py index 83e73f7..18282c8 100644 --- a/tranzlate/translator.py +++ b/tranzlate/translator.py @@ -3,21 +3,22 @@ """ import functools import sys -from typing import Callable, Dict, List, Tuple, IO -from array import array +from typing import Callable, Dict, Generator, List, Tuple, IO, Any import time -import copy import random -from bs4 import BeautifulSoup -from bs4.element import Tag -from concurrent.futures import ThreadPoolExecutor from translators.server import TranslatorsServer, tss, Tse import simple_file_handler as sfh +import itertools +from asgiref.sync import sync_to_async +import asyncio +import textwrap from .exceptions import TranslationError, UnsupportedLanguageError +__all__ = ["Translator", "add_translatable_html_tag"] + _translatable_tags = ( 'h1', 'u', 's', 'abbr', 'del', 'pre', 'h5', 'sub', 'kbd', 'li', 'dd', 'textarea', 'dt', 'input', 'em', 'sup', 'label', 'button', 'h6', @@ -26,45 +27,23 @@ 'small', 'b', 'q', 'option', 'code', 'h2', 'a', 'strong', 'span', ) - -def _slice_iterable(iter: List | str | Tuple | array, slice_size: int): - ''' - Slices an iterable into smaller iterables of size `slice_size` - - Args: - iter (Iterable): The iterable to slice. - slice_size (int): The size of each slice - ''' - if not isinstance(iter, (list, tuple, str, array)): - raise TypeError('Invalid argument type for `iter`') - if not isinstance(slice_size, int): - raise TypeError('Invalid argument type for `slice_size`') - if slice_size < 1: - raise ValueError('`slice_size` should be greater than 0') - - return [ iter[ i : i + slice_size ] for i in range(0, len(iter), slice_size) ] +def add_translatable_html_tag(tag: str) -> None: + """Add a new HTML tag name to the global list of translatable HTML elements""" + global _translatable_tags + if tag in _translatable_tags: + return + _translatable_tags = (tag, *_translatable_tags) + return None -class Translator: +class Translator(object): """ - A Wrapper around the `TranslatorServer` class from the `translators` package by UlionTse. + Wraps around the `TranslatorServer` class from the `translators` package by UlionTse, + providing a simpler interface for translation. Read more about the `translators` package here: https://pypi.org/project/translators/ - - Usage Example: - - ```python - import tranzlate - - translator = tranzlate.Translator() - text = "Yoruba is a language spoken in West Africa, most prominently Southwestern Nigeria." - translation = translator.translate(text, "en", "yo") - print(translation) - - # Output: "Yorùbá jẹ́ èdè tí ó ń ṣe àwọn èdè ní ìlà oòrùn Áfríkà, tí ó wà ní orílẹ̀-èdè Gúúsù Áfríkà." - ``` """ _server = tss @@ -73,14 +52,13 @@ def __init__(self, engine: str = "bing"): Create a Translator instance. :param engine (str): Name of translation engine to be used. Defaults to "bing" - as it is proven to be the most reliable. + as it has been tested to be the most reliable. #### Call `Translator.engines` to get a list of supported translation engines. """ - if not engine in self.engines(): + if engine not in type(self).engines(): raise ValueError(f"Invalid translation engine: {engine}") self.engine_name = engine - self._cache = {} return None @@ -98,16 +76,19 @@ def engine(self) -> Tse: @property def engine_api(self) -> Callable: - """Method used by the translation engine to make translation requests""" + """API used by the translation engine to carryout translations""" return getattr(self.server, f"{self.engine_name}") @property - def input_limit(self) -> int: + def input_limit(self) -> int | None: """ The maximum number of characters that can be translated at once. This is dependent on the translation engine being used. """ - return self.engine.input_limit + try: + return self.engine.input_limit + except Exception: + return None @functools.cached_property def language_map(self) -> Dict: @@ -117,9 +98,10 @@ def language_map(self) -> Dict: """ try: return self.server.get_languages(self.engine_name) - except: + except BaseException: return {} + @property def supported_languages(self) -> List: """ @@ -135,7 +117,45 @@ def engines(cls) -> List[str]: return cls._server.translators_pool - def is_supported_language(self, lang_code: str) -> bool: + @classmethod + def detect_language(cls, text: str) -> Dict: + """ + Detects the language of the specified text. + + :param text (str): The text to detect the language of. + :return: A dictionary containing the language code and confidence score. + + Bing is the preferred translation engine for this method as it works well for + this purpose. However, it is not guaranteed to work always. + + Usage Example: + ```python + import tranzlate + + text = "Yoruba is a language spoken in West Africa, most prominently Southwestern Nigeria." + language = tranzlate.Translator.detect_language(text) + print(language) + + # Output: {'language': 'en', 'score': 1.0} + ``` + """ + if not isinstance(text, str): + raise TypeError("Invalid type for `text`") + if not text: + raise ValueError("`text` cannot be empty") + try: + result: Dict[str, Any] = cls._server.translate_text( + query_text=text, + translator="bing", + is_detail_result=True + ) + return result.get('detectedLanguage', {}) + except Exception as exc: + sys.stderr.write(f"Error detecting language: {exc}\n") + return {} + + + def supports_language(self, lang_code: str) -> bool: ''' Check if the source language with the specified language code is supported by the translator's engine. @@ -145,10 +165,8 @@ def is_supported_language(self, lang_code: str) -> bool: ''' if not isinstance(lang_code, str): raise TypeError("Invalid type for `lang_code`") - lang_code = lang_code.strip().lower() - if not lang_code: - raise ValueError("`lang_code` cannot be empty") + lang_code = lang_code.strip().lower() return lang_code in self.supported_languages @@ -161,12 +179,11 @@ def get_supported_target_languages(self, src_lang: str) -> List: """ if not isinstance(src_lang, str): raise TypeError("Invalid type for `src_lang`") - if not src_lang: - raise ValueError("Invalid value for `src_lang`") + return self.language_map.get(src_lang, []) - def is_supported_pair(self, src_lang: str, target_lang: str) -> bool: + def supports_pair(self, src_lang: str, target_lang: str) -> bool: """ Check if the source language and target language pair is supported by the translation engine. @@ -178,45 +195,11 @@ def is_supported_pair(self, src_lang: str, target_lang: str) -> bool: return src_lang != target_lang and target_lang in self.get_supported_target_languages(src_lang) - def detect_language(self, _s: str) -> Dict: + def check_languages(self, src_lang: str, target_lang: str) -> Tuple[str, str]: """ - Detects the language of the specified text. - - :param _s (str): The text to detect the language of. - :return: A dictionary containing the language code and confidence score. - - bing is the preferred translation engine for this method as it works well for - this purpose. However, it is not guaranteed to work well always. - - Usage Example: - ```python - import tranzlate - - translator = tranzlate.Translator() - text = "Yoruba is a language spoken in West Africa, most prominently Southwestern Nigeria." - detected_lang = translator.detect_language(text) - print(detected_lang) - - # Output: {'language': 'en', 'score': 1.0} - ``` + Performs necessary 'compatibility' checks on the source and target language. + Raises any of `TypeError`, `ValueError` or `UnsupportedLanguageError`, if there is any issue """ - if not isinstance(_s, str): - raise TypeError("Invalid type for `_s`") - if not _s: - raise ValueError("`_s` cannot be empty") - try: - result = self.server.translate_text( - query_text=_s, - translator="bing", - is_detail_result=True - ) - return result.get('detectedLanguage', {}) if result else {} - except Exception as exc: - sys.stderr.write(f"Error detecting language: {exc}\n") - return {} - - - def _check_lang_codes(self, src_lang: str, target_lang: str) -> None: if not isinstance(src_lang, str): raise TypeError("Invalid type for `src_lang`") if not isinstance(target_lang, str): @@ -227,52 +210,56 @@ def _check_lang_codes(self, src_lang: str, target_lang: str) -> None: raise ValueError("A target language must be provided") if src_lang == target_lang: - raise TranslationError("Source language and target language cannot be the same.") + raise ValueError("Source language and target language cannot be the same.") - if src_lang != 'auto' and not self.is_supported_language(src_lang): + if src_lang != 'auto' and not self.supports_language(src_lang): raise UnsupportedLanguageError( - message=f"Unsupported source language using translation engine, '{self.engine}'", + message=f"Unsupported source language using translation engine, '{self.engine_name}'", code=src_lang, engine=self.engine_name, code_type="source" ) if src_lang != "auto" and target_lang not in self.get_supported_target_languages(src_lang): raise UnsupportedLanguageError( - message=f"Unsupported target language for source language, '{src_lang}', using translation engine, '{self.engine}'", + message=f"Unsupported target language for source language, '{src_lang}', using translation engine, '{self.engine_name}'", code=target_lang, engine=self.engine_name, code_type="target" ) + return src_lang, target_lang def translate( - self, - content: str | bytes | BeautifulSoup, - src_lang: str = "auto", - target_lang: str = "en", - is_markup: bool = False, - encoding: str = "utf-8", - **kwargs - ) -> str | bytes | BeautifulSoup: + self, + content: str | bytes, + src_lang: str = "auto", + target_lang: str = "en", + *, + is_markup: bool = False, + encoding: str = "utf-8", + **kwargs + ) -> str | bytes: ''' Translate content from source language to target language. - :param content (str | bytes | BeatifulSoup): Content to be translated + :param content (str | bytes): Content to be translated :param src_lang (str, optional): Source language. Defaults to "auto". It is advisable to provide a source language to get more accurate translations. :param target_lang (str, optional): Target language. Defaults to "en". :param is_markup (bool, optional): Whether `content` is markup. Defaults to False. :param encoding (str, optional): The encoding of the content (for bytes content only). Defaults to "utf-8". - :param **kwargs: Keyword arguments to be passed to required translation method. + :param **kwargs: Keyword arguments to be passed to the translation server. + :kwarg timeout: float, default None. + :kwarg proxies: dict, default None. :return: Translated content. Usage Example: ```python import tranzlate - translator = tranzlate.Translator() + bing = tranzlate.Translator("bing") text = "Yoruba is a language spoken in West Africa, most prominently Southwestern Nigeria." - translation = translator.translate(text, "en", "yo") + translation = bing.translate(text, "en", "yo") print(translation) # Output: "Yorùbá jẹ́ èdè tí ó ń ṣe àwọn èdè ní ìlà oòrùn Áfríkà, tí ó wà ní orílẹ̀-èdè Gúúsù Áfríkà." @@ -286,13 +273,6 @@ def translate( encoding=encoding, **kwargs ) - elif isinstance(content, BeautifulSoup): - return self.translate_soup( - soup=content, - src_lang=src_lang, - target_lang=target_lang, - **kwargs - ) translation = self.translate_text( text=content.decode(encoding) if is_bytes else content, @@ -303,14 +283,13 @@ def translate( return translation.encode(encoding) if is_bytes else translation - # @functools.cache def translate_text( - self, - text: str, - src_lang: str="auto", - target_lang: str="en", - **kwargs - ) -> str: + self, + text: str, + src_lang: str="auto", + target_lang: str="en", + **kwargs + ) -> str: ''' Translate text from `src_lang` to `target_lang`. @@ -318,60 +297,59 @@ def translate_text( :param src_lang (str, optional): Source language. Defaults to "auto". It is advisable to provide a source language to get more accurate translations. :param target_lang (str, optional): Target language. Defaults to "en". - :param **kwargs: Keyword arguments to be passed to `server.translate_text`. - :kwarg is_detail_result: boolean, default False. - :kwarg professional_field: str, support baidu(), caiyun(), alibaba() only. + :param **kwargs: Keyword arguments to be passed to the translation server. :kwarg timeout: float, default None. :kwarg proxies: dict, default None. - :kwarg sleep_seconds: float, default random.random(). - :kwarg update_session_after_seconds: float, default 1500. - :kwarg if_use_cn_host: bool, default False. - :kwarg reset_host_url: str, default None. - :kwarg if_ignore_empty_query: boolean, default False. - :kwarg if_ignore_limit_of_length: boolean, default False. - :kwarg limit_of_length: int, default 5000. - :kwarg if_show_time_stat: boolean, default False. - :kwarg show_time_stat_precision: int, default 4. - :kwarg lingvanex_model: str, default 'B2C'. :return: Translated text. ''' if not isinstance(text, str): raise TypeError("Invalid type for `text`") if not text: - raise ValueError("`text` cannot be empty") + return text - self._check_lang_codes(src_lang, target_lang) - kwargs_ = {'if_ignore_empty_query': True} + src_lang, target_lang = self.check_languages(src_lang, target_lang) kwargs.pop('is_detail_result', None) - kwargs_.update(kwargs) + kwds = {**kwargs, 'if_ignore_empty_query': True} - def _translate(text: str): + def translate(text: str) -> str: + """Translate text using translation engine""" return self.engine_api( query_text=text, to_language=target_lang, from_language=src_lang, - **kwargs_ + **kwds ) + async_translate = sync_to_async(translate) + + def translate_in_chunks(text: str, chunksize: int) -> str: + tasks = list(map(async_translate, chunks(text, chunksize))) + async def execute_tasks() -> List[str]: + return await asyncio.gather(*tasks) + + translated_chunks = asyncio.run(execute_tasks()) + return "".join(translated_chunks) try: - chunks = _slice_iterable(text, self.input_limit - 1) - translated_chunks = list(map(_translate, chunks)) - translated_text = "".join(translated_chunks) - return translated_text - + input_limit = self.input_limit or 1000 + if len(text) > input_limit: + return translate_in_chunks(text, input_limit) + return translate(text) except Exception as exc: - raise TranslationError(exc.__str__()) + raise TranslationError(str(exc)) from exc def translate_file( - self, - filepath: str, - src_lang: str="auto", - target_lang: str="en", - **kwargs - ) -> IO: + self, + filepath: str, + src_lang: str="auto", + target_lang: str="en", + **kwargs + ) -> IO: ''' - Translates file from `src_lang` to `target_lang`. + Translates file from `src_lang` to `target_lang`. + This method replaces the file content the translation. + You may need to translate a duplicate if you do not want to modify + the original file. Supported file types include: .txt, .csv, .doc, .docx, .pdf, .md..., mostly files with text content. @@ -379,26 +357,15 @@ def translate_file( :param src_lang (str, optional): Source language. Defaults to "auto". It is advisable to provide a source language to get more accurate translations. :param target_lang (str, optional): Target language. Defaults to "en". - :param **kwargs: Keyword arguments to be passed to the `translate_text` method. - :kwarg professional_field: str, support baidu(), caiyun(), alibaba() only. + :param **kwargs: Keyword arguments to be passed to the translation server. :kwarg timeout: float, default None. :kwarg proxies: dict, default None. - :kwarg sleep_seconds: float, default random.random(). - :kwarg update_session_after_seconds: float, default 1500. - :kwarg if_use_cn_host: bool, default False. - :kwarg reset_host_url: str, default None. - :kwarg if_ignore_empty_query: boolean, default False. - :kwarg if_ignore_limit_of_length: boolean, default False. - :kwarg limit_of_length: int, default 5000. - :kwarg if_show_time_stat: boolean, default False. - :kwarg show_time_stat_precision: int, default 4. - :kwarg lingvanex_model: str, default 'B2C'. :return: Translated file. ''' - self._check_lang_codes(src_lang, target_lang) - kwargs_ = {'if_ignore_empty_query': True} + src_lang, target_lang = self.check_languages(src_lang, target_lang) + kwds = {'if_ignore_empty_query': True} kwargs.pop('is_detail_result', None) - kwargs_.update(kwargs) + kwds.update(kwargs) try: with sfh.FileHandler(filepath, exists_ok=True, not_found_ok=False) as file_handler: @@ -406,10 +373,10 @@ def translate_file( if not content: return file_handler.file - if file_handler.filetype in ['xhtml', 'htm', 'shtml', 'html', 'xml']: - translation = self.translate_markup(content, src_lang, target_lang, **kwargs_) + if file_handler.filetype in ('xhtml', 'htm', 'shtml', 'html', 'xml'): + translation = self.translate_markup(content, src_lang, target_lang, **kwds) else: - translation = self.translate_text(content, src_lang, target_lang, **kwargs_) + translation = self.translate_text(content, src_lang, target_lang, **kwds) file_handler.write_to_file(translation, write_mode='w+') return file_handler.file @@ -419,109 +386,100 @@ def translate_file( ) from exc - def _translate_soup_tag( - self, - tag: Tag, - src_lang: str = "auto", - target_lang: str = "en", - _ct: int = 0, - **kwargs - ): + def translate_tag( + self, + tag, + src_lang: str = "auto", + target_lang: str = "en", + **kwargs + ): ''' - Translates the text of a bs4.element.Tag object 'in place'. + Translates the text of a `bs4.element.Tag` object 'in place'. - NOTE: - * This function is not meant to be called directly. Use `translate_soup` instead. - * This function is recursive. - * This function modifies the element in place. - * Translations are cached by default to avoid repeated translations which can be costly. - - :param element (bs4.element.Tag): The tag whose text content is to be translated. + :param element (`bs4.element.Tag`): The tag whose text content is to be translated. :param src_lang (str, optional): Source language. Defaults to "auto". :param target_lang (str, optional): Target language. Defaults to "en". - :param _ct (int, optional): The number of times the function has been called recursively. Defaults to 0. - Do not pass this argument manually. + :return: The translated `bs4.element.Tag` ''' + from bs4.element import Tag + if not isinstance(tag, Tag): raise TypeError("Invalid type for `tag`") - if not isinstance(_ct, int): - raise TypeError("Invalid type for `_ct`") - - if tag.string and tag.string.strip(): - initial_string = copy.copy(tag.string) - cached_translation = self._cache.get(tag.string, None) - if cached_translation: - tag.string.replace_with(cached_translation) - else: - try: - translation = self.translate_text( - text=tag.string, - src_lang=src_lang, - target_lang=target_lang, - **kwargs - ) - tag.string.replace_with(translation) - - except Exception as exc: - error_ = TranslationError(f"Error translating tag: {exc}") - sys.stderr.write(f"{error_}\n") - # try again - _ct += 1 - # prevents the translation engine from blocking our IP address - time.sleep(random.random(2, 4) * _ct) - if _ct <= 3: - return self._translate_soup_tag(tag, src_lang, target_lang, _ct, **kwargs) - finally: - self._cache[initial_string] = translation - return None + + if not (tag.string and tag.string.strip()): + return tag + + translation = self.translate_text( + text=tag.string, + src_lang=src_lang, + target_lang=target_lang, + **kwargs + ) + tag.string.replace_with(translation) + return tag def translate_soup( - self, - soup: BeautifulSoup, - src_lang: str = "auto", - target_lang: str = "en", - thread: bool = True, - **kwargs - ) -> BeautifulSoup: + self, + soup, + src_lang: str = "auto", + target_lang: str = "en", + **kwargs + ): ''' - Translates the text of a BeautifulSoup object. + Translates the text of a `BeautifulSoup` object. - :param soup (BeautifulSoup): The BeautifulSoup object whose text is to be translated. + :param soup (`BeautifulSoup`): The `BeautifulSoup` object whose text is to be translated. :param src_lang (str, optional): Source language. Defaults to "auto". It is advisable to provide a source language to get more accurate translations. :param target_lang (str, optional): The target language for translation. Defaults to "en". - :param thread (bool, optional): Whether to use multi-threading to translate the text. Defaults to True. - :return: The translated BeautifulSoup object. + :return: The translated `BeautifulSoup` object. ''' + try: + from bs4 import BeautifulSoup + except ImportError: + raise ImportError( + '"bs4" is required to translate soup. Run `pip install beautifulsoup4` in your terminal to install it' + ) + if not isinstance(soup, BeautifulSoup): raise TypeError("Invalid type for `soup`") - self._check_lang_codes(src_lang, target_lang) - translatables = soup.find_all(_translatable_tags) - translatables = list(filter(lambda el: bool(el.string), translatables)) - if thread: - with ThreadPoolExecutor() as executor: - for tag_list in _slice_iterable(translatables, 50): - _ = executor.map( - lambda tag: self._translate_soup_tag(tag, src_lang, target_lang, **kwargs), - tag_list - ) - time.sleep(random.randint(3, 5)) - else: - for tag in translatables: - self._translate_soup_tag(tag, src_lang, target_lang, **kwargs) + + src_lang, target_lang = self.check_languages(src_lang, target_lang) + tags = soup.find_all(_translatable_tags) + tags = [tag for tag in tags if tag.string] + + def safe_translate_tag(*args, **kwargs): + """Ignores any exception that occurs during translation""" + try: + return self.translate_tag(*args, **kwargs) + except BaseException: + pass + + kwds = {**kwargs, "src_lang": src_lang, "target_lang": target_lang} + translate_tag = functools.partial(safe_translate_tag, **kwds) + async_translate_tag = sync_to_async(translate_tag) + # Translate tags in batches to avoid making excessive requests to translation server at once + for batch in itertools.batched(tags, 50): + tasks = list(map(async_translate_tag, batch)) + async def execute_tasks() -> List[str]: + return await asyncio.gather(*tasks) + + asyncio.run(execute_tasks()) + time.sleep(random.randint(1, 3)) return soup def translate_markup( - self, - markup: str | bytes, - src_lang: str = "auto", - target_lang: str = "en", - markup_parser: str = "lxml", - encoding: str = "utf-8", - **kwargs - ) -> str | bytes: + self, + markup: str | bytes, + src_lang: str = "auto", + target_lang: str = "en", + *, + markup_parser: str = "lxml", + encoding: str = "utf-8", + **kwargs + ) -> str | bytes: ''' Translates markup (html, xml, etc.) @@ -531,28 +489,33 @@ def translate_markup( :param target_lang (str, optional): Target language. Defaults to "en". :param markup_parser (str, optional): The (beautifulsoup) markup parser to use. Defaults to "lxml". :param encoding (str, optional): The encoding of the markup (for bytes markup only). Defaults to "utf-8". - :param **kwargs: Keyword arguments to be passed to the `translate_soup` method. - :kwarg thread: bool, default True. - :kwarg professional_field: str, support baidu(), caiyun(), alibaba() only. + :param kwargs: Keyword arguments to be passed to the translation server. :kwarg timeout: float, default None. :kwarg proxies: dict, default None. - :kwarg sleep_seconds: float, default random.random(). - :kwarg update_session_after_seconds: float, default 1500. - :kwarg if_use_cn_host: bool, default False. - :kwarg reset_host_url: str, default None. - :kwarg if_ignore_empty_query: boolean, default False. - :kwarg if_ignore_limit_of_length: boolean, default False. - :kwarg limit_of_length: int, default 5000. - :kwarg if_show_time_stat: boolean, default False. - :kwarg show_time_stat_precision: int, default 4. - :kwarg lingvanex_model: str, default 'B2C'. :return: Translated markup. ''' + try: + from bs4 import BeautifulSoup + except ImportError: + raise ImportError( + '"bs4" is required to translate soup. Run `pip install beautifulsoup4` in your terminal to install it' + ) + if not isinstance(markup, (str, bytes)): raise TypeError("Invalid type for `markup`") - + if not markup: + return markup + is_bytes = isinstance(markup, bytes) kwargs.pop('is_detail_result', None) soup = BeautifulSoup(markup, markup_parser, from_encoding=encoding if is_bytes else None) translated_markup = self.translate_soup(soup, src_lang, target_lang, **kwargs).prettify() - return translated_markup.encode('utf-8') if is_bytes else translated_markup + return translated_markup.encode(encoding) if is_bytes else translated_markup + + + +def chunks(text: str, size: int) -> Generator[str, Any, None]: + """Yields a chunk of the text on each iteration, respecting word boundaries.""" + wrapper = textwrap.TextWrapper(width=size, break_long_words=False, replace_whitespace=False) + for chunk in wrapper.wrap(text): + yield chunk