Repository: hci-lab/PyQuran
Branch: release
Commit: 24c70e4e7315
Files: 55
Total size: 1.1 MB
Directory structure:
gitextract_ie97inb_/
├── .gitignore
├── CONTRIBUTING.md
├── CodeConventions/
│ ├── README.md
│ └── example_google.py
├── DOCUMENTATION.md
├── LICENSE
├── QuranCorpus/
│ └── quran-uthmani.xml
├── README.md
├── __init__.py
├── core/
│ ├── __init__.py
│ └── pyquran.py
├── documentation/
│ ├── TODO
│ ├── __init__.py
│ ├── auto_gen_docs.py
│ ├── docs/
│ │ ├── Alphabetical-Systems.md
│ │ ├── CONTRIBUTING.md
│ │ ├── FAQ.md
│ │ ├── Filtering-Special-Recitation-Symbols.md
│ │ ├── Home.md
│ │ ├── PyQuran-Founders.md
│ │ ├── Wiki-Home.md
│ │ ├── analysis_tools.md
│ │ ├── arabic_tools.md
│ │ ├── authors.md
│ │ ├── code_conventions.md
│ │ ├── dictFrec.md
│ │ ├── example_google.md
│ │ ├── index.md
│ │ ├── maintainers.md
│ │ ├── methods guide.md
│ │ ├── quran_tools.md
│ │ └── quran_tools_template.md
│ ├── generate.sh
│ ├── git-adding.sh
│ ├── mkdocs.yml
│ ├── sources/
│ │ ├── analysis_tools_template.md
│ │ ├── arabic_tools_template.md
│ │ └── quran_tools_template.md
│ └── templates/
│ ├── analysis_tools_template.md
│ ├── arabic_tools_template.md
│ └── quran_tools_template.md
├── testing/
│ ├── run_test.sh
│ ├── test_pyquran.py
│ ├── test_quran.py
│ ├── test_searchHelper.py
│ └── test_shape_systems.py
└── tools/
├── AI.py
├── __init__.py
├── arabic.py
├── buckwalter.py
├── error.py
├── filtering.py
├── quran.py
├── searchHelper.py
└── shapeHelper.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
.DS_Store
umar_test.py
.git
.project
.pydevproject
.settings
__pycache__
__pycache__/*
tools/Umar_test2.ipynb
tools/.ipynb_checkpoints/
core/.pyquran.py.swp
*.pyc
*.txt
*.ipyb
*.ipynb
================================================
FILE: CONTRIBUTING.md
================================================
Contributing to PyQuran
=======================
We use GitHub issues for reporting bugs and for feature requests.
If you want to give us a hand, you may pick one of the opened issues and solve a bug, implement a feature request
or to suggest a new missing feature.
## Reporting issues
When reporting a bug, use GitHub issue with the **Bug label**, please include as
much details as possible about:
- your operating system.
- your python version.
- a self-contained code to reproduce and demonstrate the Bug.
**Issue will be closed if the Bug cannot be reproduced.**
## Feature Request
Whenever you think PyQuran is missing a feature, create a GitHub issue with **Feature Request label**,
define what you want precisely and include sufficient examples to cover all the new feature aspects.
If you would like to implement it by yourself, please read the [Contributing Code](#contributing-code) section.
## Code Contribution
Your code have to meet [these standartds](code_conventions.md).
## Contributing Flow
At first, fork the project on [GitHub](https://github.com/TahaMagdy/PyQuran/),
then, create a *feature branch* and start writing your changes.
We **DO NOT** accept changes to the *master branch*.
Once you are done, push the changes to *your feature branch*, after that create a *pull request*
with an expressive title and description.
## Commit Messages
**It is so important to commit properly**, we expect you to commit every one logical change.
A commit message should describe what have been changed, why, and reference issues fixed (if
any).
**Commit Message Properties**:
1. The Fist line is the commit title, should be less then or equal 50 characters, it must be expressive.
2. Keep the second line blank.
3. Wrap all other lines in the message body at 80 columns.
4. Include `Fixes #N`, where _N_ is the issue number the commit
fixes, if any.
Commits should look like the following:
```text
explain commit in one line
Body of commit message is a few lines of text, explaining things
in more detail, possibly giving some background about the issue
being fixed, etc.
The body of the commit message **can be several paragraphs**, and
please do proper word-wrap and keep columns shorter than about
80 characters.
Fixes #101
```
## Unit Tests
We write a test module for every PyQuran module under `PyQuran/testing`.
**Naming**
If the module is called *X*, then its testing module is called *test_X*.
*test_x* must have tough unit tests for every single function.
**Note** it is inevitable to run all testing modules before you make any pull
request. Pull Requests will not be accepted if there is one fail in testing
modules. So, please run them all first.
================================================
FILE: CodeConventions/README.md
================================================
Code Conventions
================
> This helps everyone to read and maintain the code **even when they are maintains someone else code**
> *Please restrict to the rules.*
## Rules:
* A line **must not** exceed *80 character* length.
* Use **Spaces** not **Tabs**.
* Always return to `example_google.py` file.
* We dissagree with `example_goole.py` in variables naming ONLY,
and **we agree with it in the whole entire rest**.
## Naming:
* **Class Name**: [PascalCase](https://en.wikipedia.org/wiki/PascalCase): initial letter is **upper case**
* *Examples*: `Class, NewClass, ...`
* **Function**: [snake_case](https://en.wikipedia.org/wiki/Snake_case): Lowercase underscore-separated names.
* *Examples*: `foo, foo_name, ...`
* **Variables**: [lowerCamelCase](https://en.wikipedia.org/wiki/Camel_case): initial letter is **lower case** and rest are PascalCasee.
* *Examples*: `variable, varibaleName, ...`
## Function prototypes
* Functions should have a description followed by sections as in the following example.
* You don't need to include all section, but include what makes the function as clear as possible.
* **Function prototypes also used for proposed functions**.
```python
def function_with_types_in_docstring(param1, param2):
"""Here you write a rigorous description of the function
Args:
param1 (int): The first parameter.
param2 (str): The second parameter.
Returns:
bool: The return value. True for success, False otherwise.
Note:
Do not include the `self` parameter in the ``Args`` section.
"""
pass # in case it is just a prototype (not implemented yet)
```
================================================
FILE: CodeConventions/example_google.py
================================================
# -*- coding: utf-8 -*-
"""Example Google style docstrings.
This module demonstrates documentation as specified by the `Google Python
Style Guide`_. Docstrings may extend over multiple lines. Sections are created
with a section header and a colon followed by a block of indented text.
Example:
Examples can be given using either the ``Example`` or ``Examples``
sections. Sections support any reStructuredText formatting, including
literal blocks::
$ python example_google.py
Section breaks are created by resuming unindented text. Section breaks
are also implicitly created anytime a new section starts.
Attributes:
module_level_variable1 (int): Module level variables may be documented in
either the ``Attributes`` section of the module docstring, or in an
inline docstring immediately following the variable.
Either form is acceptable, but the two should not be mixed. Choose
one convention to document module level variables and be consistent
with it.
Todo:
* For module TODOs
* You have to also use ``sphinx.ext.todo`` extension
.. _Google Python Style Guide:
http://google.github.io/styleguide/pyguide.html
"""
module_level_variable1 = 12345
module_level_variable2 = 98765
"""int: Module level variable documented inline.
The docstring may span multiple lines. The type may optionally be specified
on the first line, separated by a colon.
"""
def function_with_types_in_docstring(param1, param2):
"""Example function with types documented in the docstring.
`PEP 484`_ type annotations are supported. If attribute, parameter, and
return types are annotated according to `PEP 484`_, they do not need to be
included in the docstring:
Args:
param1 (int): The first parameter.
param2 (str): The second parameter.
Returns:
bool: The return value. True for success, False otherwise.
.. _PEP 484:
https://www.python.org/dev/peps/pep-0484/
"""
def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
"""Example function with PEP 484 type annotations.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
The return value. True for success, False otherwise.
"""
def module_level_function(param1, param2=None, *args, **kwargs):
"""This is an example of a module level function.
Function parameters should be documented in the ``Args`` section. The name
of each parameter is required. The type and description of each parameter
is optional, but should be included if not obvious.
If \*args or \*\*kwargs are accepted,
they should be listed as ``*args`` and ``**kwargs``.
The format for a parameter is::
name (type): description
The description may span multiple lines. Following
lines should be indented. The "(type)" is optional.
Multiple paragraphs are supported in parameter
descriptions.
Args:
param1 (int): The first parameter.
param2 (:obj:`str`, optional): The second parameter. Defaults to None.
Second line of description should be indented.
*args: Variable length argument list.
**kwargs: Arbitrary keyword arguments.
Returns:
bool: True if successful, False otherwise.
The return type is optional and may be specified at the beginning of
the ``Returns`` section followed by a colon.
The ``Returns`` section may span multiple lines and paragraphs.
Following lines should be indented to match the first line.
The ``Returns`` section supports any reStructuredText formatting,
including literal blocks::
{
'param1': param1,
'param2': param2
}
Raises:
AttributeError: The ``Raises`` section is a list of all exceptions
that are relevant to the interface.
ValueError: If `param2` is equal to `param1`.
"""
if param1 == param2:
raise ValueError('param1 may not be equal to param2')
return True
def example_generator(n):
"""Generators have a ``Yields`` section instead of a ``Returns`` section.
Args:
n (int): The upper limit of the range to generate, from 0 to `n` - 1.
Yields:
int: The next number in the range of 0 to `n` - 1.
Examples:
Examples should be written in doctest format, and should illustrate how
to use the function.
>>> print([i for i in example_generator(4)])
[0, 1, 2, 3]
"""
for i in range(n):
yield i
class ExampleError(Exception):
"""Exceptions are documented in the same way as classes.
The __init__ method may be documented in either the class level
docstring, or as a docstring on the __init__ method itself.
Either form is acceptable, but the two should not be mixed. Choose one
convention to document the __init__ method and be consistent with it.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
msg (str): Human readable string describing the exception.
code (:obj:`int`, optional): Error code.
Attributes:
msg (str): Human readable string describing the exception.
code (int): Exception error code.
"""
def __init__(self, msg, code):
self.msg = msg
self.code = code
class ExampleClass(object):
"""The summary line for a class docstring should fit on one line.
If the class has public attributes, they may be documented here
in an ``Attributes`` section and follow the same formatting as a
function's ``Args`` section. Alternatively, attributes may be documented
inline with the attribute's declaration (see __init__ method below).
Properties created with the ``@property`` decorator should be documented
in the property's getter method.
Attributes:
attr1 (str): Description of `attr1`.
attr2 (:obj:`int`, optional): Description of `attr2`.
"""
def __init__(self, param1, param2, param3):
"""Example of docstring on the __init__ method.
The __init__ method may be documented in either the class level
docstring, or as a docstring on the __init__ method itself.
Either form is acceptable, but the two should not be mixed. Choose one
convention to document the __init__ method and be consistent with it.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
param1 (str): Description of `param1`.
param2 (:obj:`int`, optional): Description of `param2`. Multiple
lines are supported.
param3 (:obj:`list` of :obj:`str`): Description of `param3`.
"""
self.attr1 = param1
self.attr2 = param2
self.attr3 = param3 #: Doc comment *inline* with attribute
#: list of str: Doc comment *before* attribute, with type specified
self.attr4 = ['attr4']
self.attr5 = None
"""str: Docstring *after* attribute, with type specified."""
@property
def readonly_property(self):
"""str: Properties should be documented in their getter method."""
return 'readonly_property'
@property
def readwrite_property(self):
""":obj:`list` of :obj:`str`: Properties with both a getter and setter
should only be documented in their getter method.
If the setter method contains notable behavior, it should be
mentioned here.
"""
return ['readwrite_property']
@readwrite_property.setter
def readwrite_property(self, value):
value
def example_method(self, param1, param2):
"""Class methods are similar to regular functions.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
True if successful, False otherwise.
"""
return True
def __special__(self):
"""By default special members with docstrings are not included.
Special members are any methods or attributes that start with and
end with a double underscore. Any special member with a docstring
will be included in the output, if
``napoleon_include_special_with_doc`` is set to True.
This behavior can be enabled by changing the following setting in
Sphinx's conf.py::
napoleon_include_special_with_doc = True
"""
pass
def __special_without_docstring__(self):
pass
def _private(self):
"""By default private members are not included.
Private members are any methods or attributes that start with an
underscore and are *not* special. By default they are not included
in the output.
This behavior can be changed such that private members *are* included
by changing the following setting in Sphinx's conf.py::
napoleon_include_private_with_doc = True
"""
pass
def _private_without_docstring(self):
pass
================================================
FILE: DOCUMENTATION.md
================================================
# Documentation
* [Features](#features)
* [Imporatan information](#imporatan-information)
* [Usage](#usage)
* [Functions](#functions)
* [Access functions](#access-functions)
* [get_sura](#get_sura)
* [get_verse](#get_verse)
* [get_token](#get_token)
* [get_sura_number](#get_sura_number)
* [get_sura_name](#get_sura_name)
* [get_verse_count](#get_verse_count)
* [Manipulate functions](#manipulate-functions)
* [separate_token_with_diacritics](#separate_token_with_diacritics)
* [get_tashkeel_binary](#get_tashkeel_binary)
* [unpack_alef_mad](#unpack_alef_mad)
* [shape](#shape)
* [check_system](#check_system)
* [check_all_alphabet](#check_all_alphabet)
* [buckwalter_transliteration](#buckwalter_transliteration)
* [extract_tashkeel](#extract_tashkeel)
* [Analysis functions](#analysis-functions)
* [count_shape](#count_shape)
* [count_token](#count_token)
* [frequency_of_character](#frequency_of_character)
* [generate_frequancy_dictionary](#generate_frequancy_dictionary)
* [sort_dictionary_by_similarity](#sort_dictionary_by_similarity)
* [check_sura_with_frequency](#check_sura_with_frequency)
* [generate_latex_table](#generate_latex_table)
* [Search functions](#search-functions)
* [search_sequence](#search_sequence)
* [search_string_with_tashkeel](#search_string_with_tashkeel)
* [search_with_pattern](#search_with_pattern)
# Features
* Access Holy-Quran :
- get **Chapter** with/without diacritics.
- get **Verse** with/without diacritics.
- get **Token** (word).
- get **Chapter name** , **Chapter number**.
- get **Verses number** in verse.
* Manipulate with Holy-Quran :
- Separate to **letters** with/without diacritics.
- Apply your **System** on Quran.
- get **Binary representation** of Holy-Quran as 0's , 1's.
- Extract **Taskill** from sentence.
- Dealing with linguistic rules like :
- Transfer Alef-mad **"آ"** to "أَأْ"
- Convert the **unicode of arabic** text to **buckwalter encoding** and vice versa
- Convert Quran to **buckwalter reprsentation** and vice versa.
* Analysis Holy-Quran:
- get **Frequency Matrix** of letters dependent on Applied _alphabet system_.
- get **Frequency dictionary** of tokens.
- sort **Frequency dictionary** using similarity threshold.
* Search in Holy-Quran using :
- **Text** and ther is a variety options.
- **diacritics pattern**.
- **binary representation pattern** using threshold.
# Imporatan information
* Note all verses/chapters/tokens start with **1** not **0**
#### AlphaSystem :
it's collection of Alphabits that you can apply it on Quran as you need, where you can treat many characters as one character, like:
```python
system = [['أ','آ','إ'],['ت','ب']]
```
here we treat **['أ','آ','إ']** as one character and **['ت','ب']** as another one and the **res characters** every one treat as one, this system applied to all functions in **PyQuran** in Counting,Search,Filltering ...etc.
the default system used in library treat every character as one , you will find some of **pre-defined** parts of system that you can use it to define your system , import **systems** to use them.
* pre-defined:
* withoutDotSystem (treat all characters has dot as one)
* hamazatSystem (treat all characters has hamza as one)
```python
from pyquran import systems
system = [['ت','ب'], systems.hamazatSystem]
```
# Usage
```python
import PyQuran as pq
```
# Functions
## Access functions:
#### get_sura
**get_sura(chapter_num,with_tashkeil)**
- takes **chapter_num** it's the number of surah and returns **list of chapter verses** and the **with_tashkeil (optional)** is the diacritics option and if **_True_** return chapter with diacritics and if **False** return without and defualt _false_ .
```python
sura = pq.get_sura(108,True)
print(sura)
>>> ['إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ', 'فَصَلِّ لِرَبِّكَ وَانْحَرْ', 'إِنَّ شَانِئَكَ هُوَ الْأَبْتَرُ']
```
#### get_verse
**get_verse(chapter_num,verse_num,with_tashkeel)**
- takes the **chapter_num** , **verse_num** and and it return **verse content** and **with_tashkeil (optional)** is the diacritics option and if **_True_** return verses with diacritics and if **False** return without and defualt _false_.
```python
ayahText=pq.get_verse(110,1,True)
print(ayahText)
>>> إِذَا جَاءَ نَصْرُ اللَّهِ وَالْفَتْحُ
```
#### get_token
**get_token(token_num , verse_num , chapter_num , with_tashkeel)**
- takes the **token_num** (position Of Token) , **verse_num** , **chapter_num** and it return **token** and **with_tashkeil (optional)** is the diacritics option and if **_True_** return token with diacritics and if **False** return without and defualt _false_ .
```python
tokenText = pq.get_token(4,1,114,True)
print(tokenText)
>>> النَّاسِ
```
#### get_sura_number
**get_sura_number(chapter_name)**
- takes the name of chapter and return it's number.
```python
suraNumber = pq.get_sura_number('الملك')
print(suraNumber)
>>> 67
```
#### get_sura_name
**get_sura_name(chapter_num)**
- takes the number of chapter and return it's.
```python
suraName = pq.get_sura_name(67)
print(suraName)
>>> الملك
```
#### get_verse_count
**get_verse_count(chapter)**
- takes **chapter** and return the number of verses.
```python
numberOfAyat = pd.get_verse_count(pq.get_sura(110,True))
print(numberOfAyat)
>>> 3
```
## Manipulate functions:
#### separate_token_with_diacritics
**separate_token_with_diacritics(sentence)**
- takes **sentence** and separate it to characters with there diacritics.
```python
wordSeparated = pq.separate_token_with_dicrites('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')
print(wordSeparated)
>>> ['إِ', 'نَّ', 'ا', ' ', 'أَ', 'عْ', 'طَ', 'يْ', 'نَ', 'كَ', ' ', 'ا', 'لْ', 'كَ', 'وْ', 'ثَ', 'رَ']
```
#### get_tashkeel_binary
**get_tashkeel_binary(verse)**
- takes the verses content or chapters with diacritics and it returns tuple of the mapping of **chracters with diacritics** to **0's,1's** and **harakah** represented as **1** and **sukun** represented as **0** and return list of diacritics too.
```python
pattern = pq.get_tashkeel_binary('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')
print(pattern)
>>> ('1010 101011 001011', ['ِ', 'ْ', 'َ', '', '', 'َ', 'ْ', 'َ', 'ْ', 'َ', 'َ', '', '', 'ْ', 'َ', 'ْ', 'َ', 'َ'])
```
#### unpack_alef_mad
**unpack_alef_mad(ayahWithAlefMad)**
- takes **ayahWithAlefMad** (sentence that has Alef-Mad) and it returns the sentence after replace **Alef-mad** to **Alef-hamza-above + fatha** and **alef-hamza-above + sukun**.
```python
unpackAlefMad = pq.unpack_alef_mad('آ')
print(unpackAlefMad)
>>> 'أْأَ'
```
#### shape
**shape(system)**
- takes **system** (a new system for alphabets) ,system is "**a list of lists**" that want to treat every "**inner list**" as one character and returns a dictionary has the same value for each set of alphabets and diffirent values for the rest of alphabets , you can see to more details [here](#imporatan-information).
```python
newSystem = [[pq.beh, pq.teh], [pq.jeem, pq.hah, pq.khah]]
updatedSystem = pq.shape(newSystem)
print(updatedSystem)
>>> {'ب': 0, 'ت': 0, 'ث': 0, 'ج': 1, 'ح': 1, 'خ': 1, 'ء': 2, 'آ': 3, 'أ': 4, 'ؤ': 5, 'إ': 6, 'ئ': 7, 'ا': 8, 'ة': 9, 'د': 10, 'ذ': 11, 'ر': 12, 'ز': 13, 'س': 14, 'ش': 15, 'ص': 16, 'ض': 17, 'ط': 18, 'ظ': 19, 'ع': 20, 'غ': 21, 'ف': 22, 'ق': 23, 'ك': 24, 'ل': 25, 'م': 26, 'ن': 27, 'ه': 28, 'و': 29, 'ى': 30, 'ي': 31, ' ': 70}
```
#### check_all_alphabet
**check_all_alphabet(system)**
- takes **system** and return the rest of default alphabet chracters that doesn't include **system**.
```python
system = [[pq.beh, pq.teh], [pq.jeem, pq.hah, pq.khah]]
rest = pq.check_all_alphabet(system)
print(rest)
>>> ['ء', 'آ', 'أ', 'ؤ', 'إ', 'ئ', 'ا', 'ة', 'ث', 'د', 'ذ', 'ر', 'ز', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ك', 'ل', 'م', 'ن', 'ه', 'و', 'ى', 'ي']
```
#### check_system
**def check_system(system, indx=None)**
- takes **system** and return main system after apply new system and takes too **index (optional)** that return specific collection from main system.
```python
# without index
system = [[pq.beh, pq.teh], [pq.jeem, pq.hah, pq.khah]]
rest = pq.check_system(system)
print(rest)
>>> [['ب', 'ت'], ['ج', 'ح', 'خ'], ['ء'], ['آ'], ['أ'], ['ؤ'], ['إ'], ['ئ'], ['ا'], ['ة'], ['ث'], ['د'], ['ذ'], ['ر'], ['ز'], ['س'], ['ش'], ['ص'], ['ض'], ['ط'], ['ظ'], ['ع'], ['غ'], ['ف'], ['ق'], ['ك'], ['ل'], ['م'], ['ن'], ['ه'], ['و'], ['ى'], ['ي']]
# with index
system = [[pq.beh, pq.teh], [pq.jeem, pq.hah, pq.khah]]
rest = pq.check_system(system,index=1)
print(rest)
>>> ['ج', 'ح', 'خ']
```
#### buckwalter_transliteration
**buckwalter_transliteration(sentence, reverse)**
- takes an **sentence** and **reverse (optional)** the trnslate option if **True** convert **sentence** from Arabic to BuckWalter and if **False (default)** convert **sentence** from BuckWalter to Arabic.
##### note**:the encoding with **diacritics** is different from **without diacritics**.
```python
buckwalterEncode = pq.buckwalter_transliteration('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')
print(buckwalterEncode)
>>> aEoTayonaka Alokawovara
```
#### extract_tashkeel
**extract_tashkeel(sentence)**
- takes an **sentence** and return the tashleel only without charaters.
**Comming soooooon =D .....Taha Magedy Note**
## Analysis functions:
#### count_shape
**count_shape(text, system=None)**
- takes **text** (chapter/verse), **system (optional)** it's the shape of character as example [[bah,gem]] and return a **n*p matrix** where **n** number of verses and **p** number of collections in system and if not pass system it will apply the defualt.
```python
newSystem=[[beh, teh, theh], [jeem, hah, khah]]
alphabetAsOneShape =pq.count_shape(get_sura(110), newSystem)
print(alphabetAsOneShape)
>>> [[1 2 1 0 0 0 1 0 4 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 3 0 1 1 1 0 0]
[1 2 0 0 2 0 0 0 5 0 2 0 1 0 1 0 0 0 0 0 0 0 2 0 0 4 0 3 1 3 1 3]
[6 2 0 0 0 0 1 0 4 0 1 0 2 0 2 0 0 0 0 0 0 1 2 0 2 0 1 2 2 2 0 0]]
```
#### count_token
**count_token(text)**
- takes **text** (chapter/verse) and returns the number of tokens.
###### ***note***: the harf ('و') is not calculated as token alone
```python
numberOfToken=pq.count_token(tools.get_sura(110))
print(numberOfToken)
>>> 19
```
#### frequency_of_character
**frequency_of_character(characters,verse=None,chapterNum=0,verseNum=0,with_tashkeel=False)**
- takes **characters** that you need to count , return dictionary that havecounts characters occurrence for verses or with chapter or even all quran and the dictionary contains the key char and values is an occurrence of character .
- optional opptions:
- **verse** (str): if passed, it will applied to this string only
- **chapterNum** (int) : if passed only, it will applied to this chapter only.
- **verseNum** (int) :
- if passed only, it will applied to **verseNum** for **all Chapters**.
- if passed with **chapterNum**, it will applied to verseNum for **chapterNum**.
- **with_tashkeel** (bool):
- if **True** applied to Quran **with** Tashkieel.
- if **False** applied to Quran **without** Tashkieel.
- Note : if don't pass any **optional opptions** it will applied to all **Quran**.
```python
frequencyOfChar =tools.frequency_of_character(['أ','ب'],'قل أعوذ برب الناس',114,1)
print(frequencyOfChar)
>>> {أ:1,ب:2}
```
#### generate_frequancy_dictionary
**generate_frequency_dictionary(suraNumber=None)**
- takes **suraNumber (optional)** the number of chapter and it returns the dictionary of words contains the **word** as key and its **frequency** as value and if not pass **suraNumber** it will applied to **all-Quran**.
```python
dictionaryFrequency = pq.generate_frequency_dictionary(114)
print(dictionaryFrequency)
>>> {'الناس': 4, 'من': 2, 'قل': 1, 'أعوذ': 1, 'برب': 1, 'ملك': 1, 'إله': 1, 'شر': 1, 'الوسواس': 1, 'الخناس': 1, 'الذى': 1, 'يوسوس': 1, 'فى': 1, 'صدور': 1, 'الجنة': 1, 'والناس': 1}
```
#### sort_dictionary_by_similarity
**sort_dictionary_by_similarity(frequency_dictionary,threshold=0.8)**
- using to **cluster words by using similarity** and sort every bunch of word by most common and sort bunches descending in the same time takes the frequency dictionary generated using [generate_frequency_dictionary](#generate_frequency_dictionary) function. This function takes dictionary of frequencies and **threshold (optional)** to specify **the degree of similarity**
```python
sortedDictionary = pq.sort_dictionary_by_similarity(dictionaryFrequency)
print(sortedDictionary)
>>> {'الناس': 4, 'الخناس': 1, 'والناس': 1, 'من': 2, 'قل': 1, 'أعوذ': 1, 'برب': 1, 'ملك': 1, 'إله': 1, 'شر': 1, 'الوسواس': 1, 'الذى': 1, 'يوسوس': 1, 'فى': 1, 'صدور': 1, 'الجنة': 1}
```
#### check_sura_with_frequency
**check_sura_with_frequency(sura_num,freq_dec)**
- function checks if frequency dictionary of **specific chapter** is compatible with **original chapter** in quran, it takes **sura_num** (chapter number) and **freq_dec** (frequency dictionary) and return **True** if compatible and **False** in not.
```python
dictionaryFrequency = pq.generate_frequency_dictionary(111)
matched = pq.check_sura_with_frequency(110,dictionaryFrequency)
print(matched)
>>> False
```
#### generate_latex_table
**generate_latex_table(dictionary,filename,location=".")**
- generates latex code of table of frequency it takes dictionary frequency ,it takes **dictionary** (frequency dictionary) , **filename** and **location** (location to save) , the default location is same directory by symbol '.', then it returns **True** if the operation of generation completed successfully **False** if something wrong
```python
latexTable = pq.generate_latex_table(dictionaryFrequency,'any_file_name')
print(latexTable)
>>> True
```
## Search functions
#### search_sequence
**search_sequence(sequancesList,verse=None,chapterNum=0,verseNum=0,mode=3)**
- take list of sequances and return matched sequance, it search in verse ot chapter or All Quran,
- it return for every match :
- matched sequance
- chapter number of occurrence
- token number if word and 0 if sentence
- Note :
- if found verse != None it will use it en search .
- if no verse and found chapterNum and verseNum it will use this verse and use it to search.
- if no verse and no verseNum and found chapterNum it will search in chapter.
- if no verse and no chapterNum and no verseNum it will search in All Quran.
- it has many modes:
1. search with decorated sequance (with tashkeel), and return matched sequance with decorates (with tashkil).
2. search without decorated sequance (without tashkeel), and return matched sequance without decorates (without tashkil).
3. search without decorated sequance (without tashkeel), and return matched sequance with decorates (with tashkil).
- optional opptions:
- **verse** (str): if passed, it will applied to this string only
- **chapterNum** (int) : if passed only, it will applied to this chapter only.
- **verseNum** (int) :
- if passed only, it will applied to **verseNum** for **all Chapters**.
- if passed with **chapterNum**, it will applied to verseNum for **chapterNum**.
- **with_tashkeel** (bool):
- if **True** applied to Quran **with** Tashkieel.
- if **False** applied to Quran **without** Tashkieel.
- mode (int): this mode that you need to use and default mode 3
- Note : if don't pass any **optional opptions** it will applied to all **Quran**.
- Returns: dict() : key is sequances and value is a list of matched_sequance and their positions
```python
matchedKeyword = pq.search_sequence(['قل أعوذ برب'])
print(matchedKeyword)
>>> {'قل أعوذ برب': [('قُلْ أَعُوذُ بِرَبِّ', 0, 1, 113), ('قُلْ أَعُوذُ بِرَبِّ', 0, 1, 114)]}
```
#### search_string_with_tashkeel
**search_string_with_tashkeel(sentence,tashkeel_pattern)**
- takes an **sentence** and **tashkeel_pattern** (composed of 0's , 1's) and it returns the locations that matched the pattern of diacrictics start index **inclusive** and end index **exculsive** and return empty list if not found.
```python
sentence = 'صِفْ ذَاْ ثَنَاْ كَمْ جَاْدَ شَخْصٌ'
tashkeel_pattern = ar.fatha + ar.sukun
results = pq.search_string_with_tashkeel(sentence,tashkeel_pattern)
print(results)
>>> [(3, 5), (7, 9), (10, 12), (13, 15), (17, 19)]
```
#### search_with_pattern
**search_with_pattern(pattern,sentence=None,verseNum=None,chapterNum=None,threshold=1)**
- this function use to search in 0's,1's pattern and return matched words from sentence pattern dependent on the threshold, it takes a **patter** that you need to looking for , and **sentence (optional)** (sentence where will search), **chapterNum (opetional)** and **verseNum (opetional)** and return list of matched words and sentences.
- Cases:
1. if pass sentece only or with another args
it will search in sentece only.
2. if not passed sentence and passed verseNum and chapterNum,
it will search in this verseNum that exist in chapterNum only.
3. if not passed sentence,verseNum and passed chapterNum only,
it will search in this specific chapter only
* Note : it's takes time dependent on your threshold and size of chapter, so it's not support to search on All-Quran becouse it take very long time more than 11 min.
```python
result = pq.search_with_pattern(pattern="01111",chapterNum=1,threshold=0.9)
print(result)
>>>['الرَّحِيمِ مَلِكِ', 'نَعْبُدُ وَإِيَّاكَ', 'الْمُسْتَقِيمَ صِرَطَ']
```
================================================
FILE: LICENSE
================================================
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Copyright (C)
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.
================================================
FILE: QuranCorpus/quran-uthmani.xml
================================================
================================================
FILE: README.md
================================================
# PyQuran: The Python package for Quranic Analysis
PyQuran is a package which provides tools for Quranic Analysis and Arabic texts.
It is still a small package which needs a lot of your effort. We believe that it
is a seed of a fundamental general package for
computations on Quran with Python, even at the most basic level which is simply
retrieving Quran text.
*Before Islam*, Arabic letters were without dots—
[*rasm*](https://en.wikipedia.org/wiki/Rasm), which resulted in ambiguty, two or three
letters had the same rasm or form.
Muslims have decided to remove this ambiguity by adding
dots above or below each letter of the ones which share the same rasm. Now each letter has a unique form. By the way,
originally, Quran was written using letters without dots.
To enable researchers to use modern alphabet, old rasm or other, we introduce *alphabetical systems*,
It is a dynamic construction of letters— Alphabetical Systems.
## Quran Corpus
We use [tanzil](http://tanzil.net/docs/download) Quran Corpus (*Uthmani Text*), it is in `UTF-8` encoding. You
can find all unique characters of Uthmanic Corpus
[here](https://hci-lab.github.io/PyQuran-Private/Filtering-Special-Recitation-Symbols/#recitation-symbols).
There are *special recitation symbols* مصطلحات الضبط in the *Uthmani Text*, they are a guide for the reciter
to know the right positions to pause and the rules of tajweed.
We provide an interface to filter those symbols, *on the fly while fetching from the corpus*,
we **DO NOT** change the corpus, NEVER.
[For the full details about filtering *special recitation symbols* مصطلحات
الضبط.](https://hci-lab.github.io/PyQuran-Private/Filtering-Special-Recitation-Symbols/#recitation-symbols)
## Current Features
- [Quran Retrieving.](https://hci-lab.github.io/PyQuran-Private/quran_tools/)
- Advanced Searching, by
[Text](https://hci-lab.github.io/PyQuran-Private/analysis_tools/#search_sequence)
and [Diacritics](https://hci-lab.github.io/PyQuran-Private/analysis_tools/#search_string_with_tashkeel) Patterns.
- [Buckwalter Transliteration](https://hci-lab.github.io/PyQuran-Private/arabic_tools/#buckwalter_transliteration), back and forth.
- Multiple [Alphabetical Systems](https://hci-lab.github.io/PyQuran-Private/arabic_tools/#alphabetical-systems).
- Words Frequency Table المعجم الترددى للألفاظ .
## PyQuran needs and Upcoming Features.
- Words Frequency Table filtered according to words meaning.
- Morphology analysis of words to their roots.
- Arabic tools for representing Arabic text for AI algorithms and neural
networks, for more serious Arabic text processing and understanding. Those
tools should take meaning, diacritics, roots and other morphology aspects in
account.
- Some PyQuran in-house tools and architecture enhancement will be on GitHub
Issues for you contributors to make PyQuran professional and easy to use.
## Contributing
To contribute and maintain PyQuran, Please read [CONTRIBUTING](https://hci-lab.github.io/PyQuran-Private/CONTRIBUTING) section.
## Dependencies
- [numpy](http://www.numpy.org/)
- [pyarabic](https://github.com/linuxscout/pyarabic)
## Install
- From PyPI: `$ pip3 install pyquran`
## Citing
```
@MISC {PyQuran2018,
author = "Waleed A. Yousef and
Taha M. Madbouly and
Omar M. Ibrahime and
Ali H. El-Kassas and
Ali O. Hassan and
Abdallah R. Albohy and
Moustafa A. Mahmoud",
title = "PyQuran: The Python package for Quranic Analysis",
howpublished = "https://hci-lab.github.io/PyQuran-Private",
year = "2018"}
```
## Communication
[Author Page](https://hci-lab.github.io/PyQuran-Private/authors)
================================================
FILE: __init__.py
================================================
"""
"""
from pyquran.tools import quran
from pyquran.tools import arabic
from pyquran.core.pyquran import *
================================================
FILE: core/__init__.py
================================================
# Adding another searching path
from sys import path
import os
# The current path of the current module.
path_current_module = os.path.dirname(os.path.abspath(__file__))
tools_modules = '../tools/'
tools_path = os.path.join(path_current_module, tools_modules)
path.append(tools_path)
================================================
FILE: core/pyquran.py
================================================
"""Main PyQuran Library Module
* Data: Sat Nov 18 03:30:41 EET 2017
This module contains tools for `Quranic Analysis`
(More expressive description later)
"""
# Adding another searching path
from sys import path
import os
# The current path of the current module.
path_current_module = os.path.dirname(os.path.abspath(__file__))
tools_modules = '../tools/'
tools_path = os.path.join(path_current_module, tools_modules)
path.append(tools_path)
import quran
import sys
import error
import numpy
import operator
import re
import searchHelper
import functools
import difflib as dif
import arabic
from arabic import *
from pyarabic.araby import strip_tashkeel, strip_tatweel,separate,strip_tatweel
from audioop import reverse
from itertools import chain
from collections import Counter, defaultdict
import buckwalter
import sys
import shapeHelper
from collections import OrderedDict
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
from xml.dom import minidom
def parse_sura(n, alphabets=['ل', 'ب']):
"""parses the sura and returns a matrix (ndarray),
the rows number equals to the ayat number,
and the columns number equals to the length of alphabets
What it does:
it calculates number of occurrences of each on of letters
in the alphabets for each aya.
If `A` is a ndarray,
then A[i,j] is the number of occurrences of the letter
alphabets[j] in the aya i.
Args:
param1 (int): the ordered number of sura in The Mus'haf.
param2 ([str]): a list of alphabets
Returns:
ndarray: with dimensions (a * m), where
`a` is the number of ayat el-sura and
`m` is the number of letters passed to the function through alphabets[]
Issue:
1. A list of Arabic letters maybe flipped by your editor,
so, the first char will be the most-right one,
unlike a list of English char, the first element
is the left-most one.
2. I didn't make alphabets[] 29 by default.
Just try it by filling the alphabets with some letters.
"""
# getting the nth sura
sura = quran.get_sura(n)
# getting the ndarray dimensions
a = len(sura)
m = len(alphabets)
# building ndarray with appropriate dimensions
A = numpy.zeros((a,m), dtype=numpy.int)
# Filling ndarray with alphabets[] occurrences
i = 0 # number of current aya
j = 0 # occurrences
for aya in sura:
for letter in alphabets:
A[i,j] = aya.count(letter)
j += 1
j = 0
i += 1
return A
def get_frequency(sentence):
"""it take sentence that you want to compute it's
words frequency.
Args:
sentence (string): sentece that compute it's frequency.
Returns:
dict: {str: int}
Example:
```python
q.get_frequency(quran.get_verse(1,1))
>>> {'الرحمن': 1, 'الرحيم': 1, 'الله': 1, 'بسم': 1}
```
"""
if type(sentence) != str:
raise TypeError('sentece should be string')
# split sentence to words
word_list = sentence.split()
#compute count of uniqe words
frequency = Counter(word_list)
#sort frequency descending
sorted_freq = dict(sorted(frequency.items(),key=operator.itemgetter(1),reverse=True))
return sorted_freq
def generate_frequency_dictionary(suraNumber=None):
"""computes the frequency dictionary; wher key is a unique word and values is the its occurrence.
Args:
suraNumber (int): it's optional
Returns:
dict: key is word, str; value is its occurrences, int.
Example:
```python
q.generate_frequency_dictionary(114)
>>> {'أعوذ': 1, 'إله': 1, 'الجنة': 1, 'الخناس': 1, 'الذى': 1, 'الناس': 4, 'الوسواس': 1, 'برب': 1, 'شر': 1, 'صدور': 1, 'فى': 1, 'قل': 1, 'ملك': 1, 'من': 2, 'والناس': 1, 'يوسوس': 1}
```
"""
if type(suraNumber) != int and suraNumber != None :
raise TypeError('suraNumber should be integer')
if suraNumber <=0 or suraNumber > arabic.swar_num:
raise ValueError('suraNumber should be in range [1-114]')
frequency = {}
#get all Quran if suraNumber is None
if suraNumber == None:
#get all Quran as one sentence
Quran = ' '.join([' '.join(quran.get_sura(i)) for i in range(1,115)])
#get all Quran frequency
frequency=get_frequency(Quran)
#get frequency of suraNumber
else:
#get sura from QuranCorpus
sura = quran.get_sura(sura_number=suraNumber)
ayat = ' '.join(sura)
#get frequency of sura
frequency = get_frequency(ayat)
return frequency
def check_sura_with_frequency(sura_num,freq_dec):
"""this function check if frequency dictionary of specific sura is
compatible with original sura in shapes count
Args:
suraNumber (int): sura number
Returns:
Boolean: True :- if compatible
Flase :- if not
Example:
```python
frequency_dic = q.generate_frequency_dictionary(114)
q.check_sura_with_frequency(114, frequency_dic)
>>> True
```
"""
if type(sura_num) != int:
raise TypeError('sura_num should be integer')
if type(freq_dec) != dict:
raise TypeError('freq_dec should be dictionary')
if sura_num <=0:
raise ValueError('sura_num should be in range [1-114]')
#get number of chars in frequency dec
num_of_chars_in_dec = sum([len(word)*count for word,count in freq_dec.items()])
#get number of chars in original sura
num_of_chars_in_sura = sum([len(aya.replace(' ','')) for aya in quran.get_sura(sura_num)])
# print(num_of_chars_in_dec ," ", num_of_chars_in_sura)
if num_of_chars_in_dec == num_of_chars_in_sura:
return True
else:
return False
def sort_dictionary_by_similarity(frequency_dictionary,threshold=0.8):
"""this function using to cluster words using similarity
and sort every bunch of word by most common and sort bunches
descending in same time
Args:
frequency_dictionary: dict, frequency dictionary to be sorted.
Returns:
dict : {str: int} sorted dictionary
Example:
```python
frequency_dic = q.generate_frequency_dictionary(114)
q.sort_dictionary_by_similarity(frequency_dic)
# this dictionary is sorted using similarity 0.8
>>> {'أعوذ': 1, 'إذا': 2, 'العقد': 1, 'الفلق': 1, 'النفثت': 1, 'برب': 1, 'حاسد': 1, 'حسد': 1, 'خلق': 1, 'شر': 4, 'غاسق': 1, 'فى': 1, 'قل': 1, 'ما': 1, 'من': 1, 'وقب': 1, 'ومن': 3}
```
"""
if type(threshold) != float:
raise TypeError('threshold should be float')
if type(frequency_dictionary) != dict:
raise TypeError('frequency_dictionary should be dictionary')
if threshold < 0 or threshold > 1:
raise ValueError('threshold should be float number in range [0-1]')
# list of dictionaries and every dictionary has similar words and we will call every dictionary as 'X'
list_of_dics = []
# this dictionary key is a position of 'X' and value the sum of frequencies of 'X'
list_of_dics_counts = dict()
#counter of X's
dic_num=0
#lock list used to lock word that added in 'X'
occurrence_list = set()
#loop on all words to cluster them
for word,count in frequency_dictionary.items():
#check if word is locked from some 'X' or not
if word not in occurrence_list:
#this use to sum all of frequencies of this 'X'
sum_of_freqs = count
#create new 'X' and add the first word
sub_dic = dict({word:count})
#add word in occurrence list to lock it
occurrence_list.add(word)
#loop in the rest word to get similar word
for sub_word,sub_count in frequency_dictionary.items():
#check if word lock or not
if sub_word not in occurrence_list:
#compute similarity probability
similarity_prob = dif.SequenceMatcher(None,word,sub_word).ratio()
# check if prob of word is bigger than threshold or not
if similarity_prob >= threshold:
#add sub_word as a new word in this 'X'
sub_dic[sub_word] = sub_count
# lock this new word
occurrence_list.add(sub_word)
# add the frequency of this new word to sum_of_freqs
sum_of_freqs +=sub_count
#append 'X' in list of dictionaries
list_of_dics.append(sub_dic)
#append position and summation of this 'X' frequencies
list_of_dics_counts[dic_num] = sum_of_freqs
# increase number of dictionaries
dic_num +=1
#sort list of dictionaries count (sort X's descending) The most frequent
list_of_dics_counts = dict(sorted(list_of_dics_counts.items(),key=operator.itemgetter(1),reverse=True))
#new frequency dictionary that will return
new_freq_dic =dict()
#loop to make them as one dictionary after sorting
for position in list_of_dics_counts.keys():
new_sub_dic = dict(sorted(list_of_dics[position].items(),key=operator.itemgetter(1),reverse=True))
for word,count in new_sub_dic.items():
new_freq_dic[word] = count
return new_freq_dic
def generate_latex_table(dictionary,filename,location="."):
"""generate latex code of table of frequency
Args:
dictionary (dict): frequency dictionary
filename (string): file name
location (string): location to save , the default location is same directory
Returns:
Boolean: True :- if Done
Flase :- if something wrong with folder name
Example:
```python
frequency_dic = q.generate_frequency_dictionary(114)
q.generate_latex_table(frequency_dic,'filename','../location')
# it's mean Done, the file 'filename.tex' is ginerated
>>> True
```
"""
if type(filename) != str:
raise TypeError('filename should be string')
if type(dictionary) != dict:
raise TypeError('dictionary should be dictionary')
head_code = """\\documentclass{article}
%In the preamble section include the arabtex and utf8 packages
\\usepackage{arabtex}
\\usepackage{utf8}
\\usepackage{longtable}
\\usepackage{color, colortbl}
\\usepackage{supertabular}
\\usepackage{multicol}
\\usepackage{geometry}
\\geometry{left=.1in, right=.1in, top=.1in, bottom=.1in}
\\begin{document}
\\begin{multicols}{6}
\\setcode{utf8}
\\begin{center}"""
tail_code = """\\end{center}
\\end{multicols}
\\end{document}"""
begin_table = """\\begin{tabular}{ P{2cm} P{1cm}}
\\textbf{words} & \\textbf{\\#} \\\\
\\hline
\\\\[0.01cm]"""
end_table= """\\end{tabular}"""
rows_num = 40
if location != '.':
filename = location +"/"+ filename
try:
file = open(filename+'.tex', 'w', encoding='utf8')
file.write(head_code+'\n')
n= int(len(dictionary)/rows_num)
words = [("\\<"+word+"> & "+str(frequancy)+' \\\\ \n') for word, frequancy in dictionary.items()]
start=0
end=rows_num
new_words = []
for i in range(n):
new_words = new_words+ [begin_table+'\n'] +words[start:end] +[end_table+" \n"]
start=end
end+=rows_num
remain_words = len(dictionary) - rows_num*n
if remain_words > 0:
new_words += [begin_table+" \n"]+ words[-1*remain_words:]+[end_table+" \n"]
for word in new_words:
file.write(word)
file.write(tail_code)
file.close()
return True
except:
return False
def shape(system):
"""shape declare a new system for alphabets ,user pass the alphabets "in a list of list"
that want to count it as on shape "inner list" and returns a dictionary has the same value
for each set of alphabets and diffrent values for the rest of alphabets
Args:
param1 ([[char]]): a list of list of alphabets , each inner list have
alphabets that with be count as one shape .
Returns:
dictionary: with all alphabets, where each char "key" have a value
value will be equals for alphabets that will be count as oe shape
"""
newSys=system
alphabetMap = OrderedDict()
indx = 0
newAlphabet = list(set(chain(*system)))
theRestOfAlphabets = list(set(alphabet) - set(newAlphabet))
for char in alphabet:
if char in theRestOfAlphabets:
alphabetMap.update({char: indx})
indx = indx + 1
elif char in newAlphabet:
#sublist that contain this char(give all chars the same indx)
#drop this sublist from the system
systemItem = shapeHelper.searcher(newSys, char)
for char in newSys[systemItem]:
alphabetMap.update({char: indx})
newSys=newSys[0:systemItem]+newSys[systemItem+1:]
newAlphabet = list(set(chain(*newSys)))
indx = indx + 1
'''
for setOfNewAlphabet in system:
for char in setOfNewAlphabet:
alphabetMap.update({char: indx})
indx = indx + 1
for char in theRestOfAlphabets:
alphabetMap.update({char: indx})
indx = indx + 1
'''
alphabetMap.update({" ": 70})
return alphabetMap
def count_rasm(text, system=None):
"""counts the occerences of each letter (As `system` defines) in sura.
Args:
text: [str], a list of strings , each inner list is ayah .
system: Optional, [[char]], revise [Alphabetical Systems](#alphabetical-systems),
if `system` is not passed, the normal alphabet is applied.
Returns:
(N * P) ndarray (Matrix A): N is the number of verses, P is the alphabet (as defined in `system`).\n
`A[i][j]` is the number of the letter `j` in the verse `i`.
Example:
```python
newSystem = [[q.beh, q.teh, q.theh], [q.jeem, q.hah, q.khah]]
q.count_rasm(q.quran.get_sura(110), newSystem)
>>>[[1 2 1 0 0 0 1 0 4 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 3 0 1 1 1 0 0]
[1 2 0 0 2 0 0 0 5 0 2 0 1 0 1 0 0 0 0 0 0 0 2 0 0 4 0 3 1 3 1 3]
[6 2 0 0 0 0 1 0 4 0 1 0 2 0 2 0 0 0 0 0 0 1 2 0 2 0 1 2 2 2 0 0]]
```
"""
#"there are a intersection between subsets"
if system == None:
alphabetMap = dict()
indx = 0
for char in alphabet:
alphabetMap.update({char: indx})
indx = indx + 1
alphabetMap.update({" ": 70})
p=len(alphabet)#+1 #the last one for space char
else:
for subSys in system:
if not isinstance(subSys, list):
raise ValueError ("system must be list of list not list")
if shapeHelper.check_repetation(system):
raise ValueError("there are a repetation in your system")
p = len(alphabet) - len(list(set(chain(*system)))) + len(system)
alphabetMap = shape(system)
n=len(text)
A=numpy.zeros((n, p), dtype=numpy.int)
i=0
j=0
charCount =[]
for verse in text:
verse=shapeHelper.convert_text_to_numbers(verse, alphabetMap)
for k in range(0,p,1) :
charCount.insert(j, verse.count(k))
j+=1
A[i, :] =charCount
i+=1
charCount=[]
j=0
return A
def get_verse_count(surah):
"""
get_verse_countget get surah as a paramter and return
how many ayah in it.
What it does: count the number of verses in surah
Args:
param1 (str ): a strings
Returns:
int: the number of verses
"""
return len(surah)
def count_token(text):
"""
count_token get a text (surah or ayah) and count the
number of tokens that it has.
What it does: count the number of tokens in text
Args:
param1 (str or [str]): a string or list of strings
Returns:
int: the number of tokens
"""
count=0
if isinstance(text, list):
for ayah in text:
count=count+ayah.count(' ')+1
else:
count=text.count(' ')+1
return count
def grouping_letter_diacritics(sentance):
"""Grouping each letter with its diacritics.
Args:
sentance: str
Returns:
[str]: a list of _x_, where _x_ is the letter accompanied with its
diacritics.
Example:
```python
q.grouping_letter_diacritics('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')\n
>>> ['إِ', 'نَّ', 'ا', ' ', 'أَ', 'عْ', 'طَ', 'يْ', 'نَ', 'كَ', ' ', 'ا', 'لْ', 'كَ', 'وْ', 'ثَ', 'رَ']
```
"""
sentance_without_tatweel = strip_tatweel(sentance)
print(sentance_without_tatweel)
hroof_with_tashkeel = []
for index,i in enumerate(sentance):
if sentance[index] in (alphabet or alefat or hamzat) or sentance[index] == ' ':
k = index
harf_with_taskeel =sentance[index]
while((k+1) != len(sentance) and (sentance[k+1] in (tashkeel or harakat or shortharakat or tanwin ))):
harf_with_taskeel =harf_with_taskeel+""+sentance[k+1]
k = k + 1
index = k
hroof_with_tashkeel.append(harf_with_taskeel)
return hroof_with_tashkeel
def frequency_of_character(characters, verse=None, chapterNum=0, verseNum=0, with_tashkeel=False):
"""counts the number of characters in a specific verse or sura or even the entrire Quran ,
Note:
If verse and chapterNum is not passed, the entire Quran is targeted
Args:
verse: str, this verse that you need to count it and default is None.
chapterNum, int, chapter number is a number of 'sura' that will count it , and default is 0.
verseNum: int, verse number in sura.
chracters: [], list of characters that you want to count them.
with_tashkeel: Bool, to check if you want to search with tashkeel.
Returns:
{dic} : {str : int} a dictionary and keys is a characters
and value is count of every chracter.
Example:
```python
q.frequency_of_character(['أ',"ب","تُ"],verseNum=2,with_tashkeel=False)
#that will count the vers number **2** in all swar
>>> {'أ': 101, 'ب': 133, 'تُ': 0}
q.frequency_of_character(['أ',"ب","تُ"],chapterNum=1,verseNum=2,with_tashkeel=False)
#that will count the vers number **2** in chapter **1**
>>> {'أ': 0, 'ب': 1, 'تُ': 0}
q.frequency_of_character(['أ',"ب","تُ"],chapterNum=1,verseNum=2,with_tashkeel=False)
#that will count in **all Quran**
>>> {'أ': 8900, 'ب': 11491, 'تُ': 2149}
```
"""
if type(characters) != list:
raise TypeError('characters should be list of characters')
if type(chapterNum) != int:
raise TypeError('chapterNum should be integer')
if type(verseNum) != int:
raise TypeError('verseNum should be integer')
#dectionary that have frequency
frequency = dict()
#check if count specific verse
if verse!=None:
if type(verse) != str:
raise TypeError('verse should be string')
if not with_tashkeel:
verse = strip_tashkeel(verse)
#count frequency of chars
frequency = searchHelper.hellper_frequency_of_chars_in_verse(verse,characters)
#check if count specific chapter
elif chapterNum!=0:
if chapterNum <0 or chapterNum > arabic.swar_num:
raise ValueError('chapterNum should be integer number in range [1-114]')
#check if count specific verse in this chapter
if verseNum!=0:
#check if verseNum out of range
if(verseNum<0):
raise ValueError('chapterNum should be positive integer ')
verse = quran.get_sura(chapterNum,with_tashkeel=with_tashkeel)[verseNum-1]
#count frequency of chars
frequency = searchHelper.hellper_frequency_of_chars_in_verse(verse,characters)
else:
#count on all chapter
chapter = " ".join(quran.get_sura(chapterNum,with_tashkeel=with_tashkeel))
#count frequency of chars
frequency = searchHelper.hellper_frequency_of_chars_in_verse(chapter,characters)
else:
if verseNum!=0:
if(verseNum<0):
raise ValueError('chapterNum should be positive integer ')
#count for specific verse in all Quran
Quran = ""
for i in range(swar_num):
Quran = Quran +" "+quran.get_verse(i+1,verseNum,with_tashkeel=with_tashkeel)+" "
#count frequency of chars
frequency = searchHelper.hellper_frequency_of_chars_in_verse(Quran,characters)
else:
#count for all Quran
Quran = ""
for i in range(swar_num):
Quran = Quran +" "+ " ".join(quran.get_sura(i+1,with_tashkeel=with_tashkeel))+" "
#count frequency of chars
frequency = searchHelper.hellper_frequency_of_chars_in_verse(Quran,characters)
return frequency
def get_token(tokenNum,verseNum,chapterNum,with_tashkeel=False):
"""
get token from specific verse form specific chapter
Args:
tokenNum (int) : position of token
verseNum (int): number of verse
chapterNum (int): number of chapter
with_tashkeel (int) : to check if search with taskeel or not
Returns:
str : return verse
Example:
```python
q.get_token(tokenNum=4,verseNum=1,chapterNum=1,with_tashkeel=True)
>>> 'الرَّحِيمِ'
```
"""
if type(tokenNum) != int:
raise TypeError('tokenNum should be integer')
if type(chapterNum) != int:
raise TypeError('chapterNum should be integer')
if type(verseNum) != int:
raise TypeError('verseNum should be integer')
if chapterNum < 0 or chapterNum > arabic.swar_num:
raise ValueError('chapterNum should be integer number in range [1-114]')
if tokenNum <= 0:
raise ValueError('tokenNum should be positive integer numbers and > 0')
if(verseNum<0):
raise ValueError('chapterNum should be positive integer ')
try:
tokens = quran.get_sura(chapterNum,with_tashkeel)[verseNum-1].split()
if tokenNum > len(tokens):
return ""
else:
return tokens[tokenNum-1]
except:
return ""
def search_sequence(sequancesList,verse=None,chapterNum=0,verseNum=0,mode=3):
"""take list of sequances and return matched sequance, it search in verse ot
chapter or All Quran ,
it return for every match :
1 - matched sequance
2 - chapter number of occurrence
3 - token number if word and 0 if sentence
Note :
- if found verse != None it will use it en search .
- if no verse and found chapterNum and verseNum it will
- use this verse and use it to search.
- if no verse and no verseNum and found chapterNum it will
- search in chapter.
- if no verse and no chapterNum and no verseNum it will
search in All Quran.
it has many modes:
- search with decorated sequance (with tashkeel),
and return matched sequance with decorates (with tashkil).
- search without decorated sequance (without tashkeel),
and return matched sequance without decorates (without tashkil).
- search without decorated sequance (without tashkeel),
and return matched sequance with decorates (with tashkil).
Args:
chapterNum: int, number of chapter where function search.
verseNum: int, number of verse wher function search.
sequancesList: [], a list of sequances that you want to match them.
mode: int, this mode that you need to use and default mode 3.
Returns:
dict: key is sequances and value is a list of matched_sequance and their positions.
Example:
```python
# search in chapter = 1 only using mode 3 (default)
q.search_sequence(sequancesList=['ملك يوم الدين'],chapterNum=1)
#it will return
#{'sequance-1' : [ (matched_sequance , position , vers_num , chapter_num) , (....) ],
# 'sequance-2' : [ (matched_sequance , position , vers_num , chapter_num) , (....) ] }
# Note : position == 0 if sequance is a sentence and == word position if sequance is a word
>>> {'ملك يوم الدين': [('مَلِكِ يَوْمِ الدِّينِ', 0, 4, 1)]}
# search in all Quran using mode 3 (default)
q.search_sequence(sequancesList=['ملك يوم'])
>>> {'ملك يوم': [('مَلِكِ يَوْمِ', 0, 4, 1), ('الْمُلْكُ يَوْمَ', 0, 73, 6), ('الْمُلْكُ يَوْمَئِذٍ', 0, 56, 22), ('الْمُلْكُ يَوْمَئِذٍ', 0, 26, 25)]}
```
"""
if type(sequancesList) != list:
raise TypeError('sequancesList should to be list of strings')
if type(verse) != str and verse != None:
raise TypeError('verse should to be string')
if type(chapterNum) != int:
raise ValueError('chapterNum should be integer')
if type(verseNum) != int:
raise ValueError('verseNum should be integer')
if chapterNum < 0 or chapterNum > arabic.swar_num:
raise ValueError('chapterNum should be integer number in range [1-114]')
if(verseNum<0):
raise ValueError('verseNumr should be positive integer and > 0')
if mode <= 0 or mode > 3:
raise ValueError('mode should be positive integer numbers 1,2 or 3 only')
final_dict = dict()
#loop on all sequances
for sequance in sequancesList:
#check mode 1 (taskeel to tashkeel)
if mode==1:
final_dict[sequance] = searchHelper.hellper_pre_search_sequance(
sequance=sequance,
verse=verse,
chapterNum=chapterNum,
verseNum=verseNum,
with_tashkeel=True)
# chaeck mode 2 (without taskeel to without tashkeel)
elif mode==2:
final_dict[sequance] = searchHelper.hellper_pre_search_sequance(
sequance=sequance,
verse=verse,
chapterNum=chapterNum,
verseNum=verseNum,
with_tashkeel=False)
# chaeck mode 3 (without taskeel to with tashkeel)
elif mode==3:
sequance = strip_tashkeel(sequance)
final_dict[sequance] = searchHelper.hellper_pre_search_sequance(
sequance=sequance,
verse=verse,
chapterNum=chapterNum,
verseNum=verseNum,
with_tashkeel=True,
mode3=True)
return final_dict
def search_string_with_tashkeel(string, key):
"""
Args:
string: str, sentence to search by key.
key: str, taskeel pattern.
Assumption:
Searches tashkeel that is exciplitly included in string.
Returns:
find: list of pairs where x and y are the start and end index of the matched.
nod-found: []
Example:
```python
string = 'صِفْ ذَاْ ثَنَاْ كَمْ جَاْدَ شَخْصٌ'
q.search_string_with_tashkeel(string, 'َْ')
>>> [(3, 5), (7, 9), (10, 12), (13, 15), (17, 19)]
```
"""
error.is_string(string, 'You must pass an string.')
# tashkeel pattern
string_tashkeel_only = searchHelper.get_string_taskeel(string)
# searching taskeel pattern
results = []
for m in re.finditer(key, string_tashkeel_only):
spacesBeforeStart = \
searchHelper.count_spaces_before_index(string_tashkeel_only, m.start())
spacesBeforeEnd = \
searchHelper.count_spaces_before_index(string_tashkeel_only, m.start())
begin = m.start() * 2 - spacesBeforeStart
end = m.end() * 2 - spacesBeforeEnd
one_result = (m.start(), m.end())
results.append(one_result)
if results == []:
return []
else:
return results
def buckwalter_transliteration(string, reverse=False):
"""Back and forth Arabic-Bauckwalter transliteration.
Revise [Buckwalter](https://en.wikipedia.org/wiki/Buckwalter_transliteration)
Args:
string: to be transliterated.
reverse: Optional boolean. `False` transliterates from Arabic to
Bauckwalter, `True` transliterates from Bauckwalter to Arabic.
Returns:
str: transliterated string.
Example:
```python
q.buckwalter_transliteration('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')\n
>>> aEoTayonaka Alokawovara
```
"""
for key, value in buckwalter.buck2uni.items():
if not reverse:
string = string.replace(value, key)
else:
string = string.replace(key, value)
return string
def get_tashkeel_binary(ayah):
'''
get_tashkeel_pattern is function takes the str or list(ayah or token) and converts to zero and ones
What it does:
take token whether ayah or sub ayah and maps it to zero for sukoon and char without diarictics
and one for char with harakat and tanwin
Args:
param1 (str): a string or list
Returns:
str : zero and ones for each token
'''
marksDictionary = {'ْ': 0, '': 0, 'ُ': 1, 'َ': 1, 'ِ': 1, 'ّ': 1, 'ٌ': 1, 'ً': 1, 'ٍ': 1}
charWithOutTashkeelOrSukun = ''
tashkeelPatternList = [] # list of zeros and ones
marksList = []
# convert the List o to string without spaces
ayahModified = ''.join(ayah.strip())
tashkeelPatternStringWithSpace = ''
# check is there a tatweel in ayah or not
if(tatweel in ayahModified):
ayahModified = strip_tatweel(ayahModified)
# check whether exist alef_mad in ayah if exist unpack the alef mad
if (alef_mad in ayahModified):
ayahModified = unpack_alef_mad(ayahModified)
# separate tashkeel from the ayah
ayahOrAyatWithoutTashkeel, marks = separate(ayahModified)
for mark in marks:
#the pyarabic returns the char of marks without tashkeel with 'ـ' so if check about this mark if not exist
#append in list harakat and zero or ones in tashkeel pattern list if yes append the marks and patterns
if (mark != 'ـ'):
marksList.append(mark)
tashkeelPatternList.append(marksDictionary[mark])
else:
marksList.append(charWithOutTashkeelOrSukun)
tashkeelPatternList.append(marksDictionary[charWithOutTashkeelOrSukun])
# convert list of Tashkeel pattern to String for each token in ayah separate with another token with spce
for posOfCharInAyah in range(0, len(ayahOrAyatWithoutTashkeel)):
if ayahOrAyatWithoutTashkeel[posOfCharInAyah] == ' ' and tashkeelPatternList[posOfCharInAyah] == 0:
tashkeelPatternStringWithSpace += ' '
else:
tashkeelPatternStringWithSpace += str(tashkeelPatternList[posOfCharInAyah])
return tashkeelPatternStringWithSpace, marksList
def factor_alef_mad(sentance):
'''It returns the `sentance` having alef_mad factored into alef_hamza and alef_wasel.
Args:
sentance: str, a string or list.
Returns:
str: sentance having the alef_mad factored
Example:
```python
q.factor_alef_mad('آ')\n
>>> 'أْأَ'
```
'''
ayahWithUnpackAlefMad = ''
for charOfAyah in sentance:
if charOfAyah != 'آ':
ayahWithUnpackAlefMad += charOfAyah
else:
ayahWithUnpackAlefMad += 'أَ'
ayahWithUnpackAlefMad += 'أْ'
return ayahWithUnpackAlefMad
def check_system(system, index=None):
''' Returns the alphabet including treated-as-one letters. If you pass the index as the second optional arguement, it returns the letter of the that index only, not the hole alphabet.
Args:
system: [[char]], a list of letters, where each letter to be treated as
one letter are in one sub-list, see [Alphabetical Systems](#alphabetical-systems).
index: Optional integer, is a index of a letter in the new system.
Returns:
list: full sorted system or a specific index.
Example:
```python
q.check_system([['alef', 'beh']])\n
>>> [['ء'],
['آ'],
['أ', 'ب'],
['ؤ'],
['إ'],
['ئ'],
['ا'],
['ة'],
['ت'],
['ث'],
['ج'],
['ح'],
['خ'],
['د'],
['ذ'],
['ر'],
['ز'],
['س'],
['ش'],
['ص'],
['ض'],
['ط'],
['ظ'],
['ع'],
['غ'],
['ف'],
['ق'],
['ك'],
['ل'],
['م'],
['ن'],
['ه'],
['و'],
['ى'],
['ي']]
```
The previous example prints each letter as one element in a new alphabet list,
as you can see the two letters alef and beh are considered one letter.
'''
if shapeHelper.check_repetation(system) == True:
raise ValueError ("there is a repetition in your system")
p = len(alphabet) - len(list(set(chain(*system)))) + len(system)
systemDict = shape(system)
fullSys = [[key for key, value in systemDict.items() if value == i] for i
in range(p)]
if index==None:
return fullSys
else:
return fullSys[index]
def search_with_pattern(pattern,sentence=None,verseNum=None,chapterNum=None,threshold=1):
'''
this function use to search in 0's,1's pattern and
return matched words from sentence pattern
dependent on the ratio to adopt threshold.
Args:
pattern (str): 0's,1's pattern that you need to search.
sentence (str): Arabic string with tashkeel where
function will search.
verseNum (int): number of specific verse where
will search.
chapterNum (int): number of specific chapter
where will search.
threshold (float): threshold of similarity , if 1 it will
get the similar exactly, and if not ,it will
get dependant on threshold number.
Cases:
1- if pass **sentece** only or with another args
it will search in sentece only.
2- if not passed **sentence** , passed **verseNum** and **chapterNum**,
it will search in this verseNum that exist in chapterNum only.
3- if not passed **sentence**,**verseNum** and passed **chapterNum** only,
it will search in this specific chapter only.
4- if not pass any args it will search in **all Quran** (not recommended, take long time).
Return:
[list] : it will return list that have matched word, or
matched senteces and return empty list if not found.
Note : it's takes time dependent on your threshold and size of chapter,
so it's not support to search on All-Quran becouse
it take very long time more than 11 min.
Example:
```python
# it will search in chapter **1** only
q.search_with_pattern("011101",chapterNum=1)
>>> ['لِلَّهِ رَبِّ', 'الْعَلَمِينَ', 'أَنْعَمْتَ عَلَيْهِمْ', 'الْمَغْضُوبِ عَلَيْهِمْ']
```
'''
if type(pattern) != str or len(pattern)!= (pattern.count('0')+pattern.count('1')):
raise TypeError('pattern should to be string of 0\'s and 1\'s like \'011011010\'')
if type(sentence) != str and sentence != None:
raise TypeError('sentece should to be string')
if type(chapterNum) != int and chapterNum != None:
raise TypeError('chapterNum should be integer')
if type(verseNum) != int and verseNum != None:
raise TypeError('verseNum should be integer')
if chapterNum < 0 or chapterNum > arabic.swar_num:
raise ValueError('chapterNum should be integer number in range [1-114]')
if(verseNum!=None and verseNum<0):
raise ValueError('verseNumr should be positive integer and > 0')
if threshold > 1 or threshold < 0:
raise ValueError('Threshold should be 0 <= Threshold <= 1')
pattern = pattern.replace(' ','')
if len(pattern)<=0:
raise ValueError('pattern don\'t passed')
#check if sentece exist
if sentence != None:
#convert sentence to 0/1
sentence_pattern,taskieel = get_tashkeel_binary(sentence)
return searchHelper.hellper_search_with_pattern(pattern=pattern,
sentence_pattern=sentence_pattern,
sentence=sentence,
ratio=threshold)
else:
#check if search in specific chapter
if chapterNum != None:
#check if search in specific verese
if verseNum != None:
sentence = quran.get_verse(chapterNum=chapterNum,
verseNum=verseNum,
with_tashkeel=True)
#search in all chapter
else:
sentence = " ".join(quran.get_sura(chapterNum,True))
#search in all Quran
else:
raise ValueError('please send sentece or verseNum and chapterNum to search.')
#convert sentence to 0/1
sentence_pattern,taskieel = get_tashkeel_binary(sentence)
sentence_pattern_without_spaces = sentence_pattern.replace(" ","")
#check if no pattern exist
if pattern not in sentence_pattern_without_spaces:
return []
else:
return searchHelper.hellper_search_with_pattern(pattern=pattern,
sentence_pattern=sentence_pattern,
sentence=sentence,
ratio=threshold)
def frequency_sura_level(suraNumber):
"""Computes the frequency dictionary for a sura
Args:
suraNumber: 1 <= Int <= 114.
Return:
[aya_frequency_dictionary]: the key of `aya_frequency_dictionary` is a
unique word in aya and the corresponding value is its frequency.
A list of frequency dictionaries for each verse of Sura.
Note:
* frequency dictionary is a python dict, which carries word frequencies
for an aya.
* Its key is (str) word, its value is (int) word frequency
Example:
```python
q.frequency_sura_level(suraNumber=1)
>>> [{بسم': 1, 'الله': 1, 'الرحمن': 1, 'الرحيم': 1'},
{الحمد': 1, 'لله': 1, 'رب': 1, 'العلمين': 1'},
{الرحمن': 1, 'الرحيم': 1'},
{ملك': 1, 'يوم': 1, 'الدين': 1'},
{إياك': 1, 'نعبد': 1, 'وإياك': 1, 'نستعين': 1'},
{اهدنا': 1, 'الصرط': 1, 'المستقيم': 1'},
{عليهم': 2',
صرط': 1',
الذين': 1',
أنعمت': 1',
غير': 1',
المغضوب': 1',
ولا': 1',
الضالين': 1'}]
```
"""
# A list of frequency dictionaries
frequency_ayat_list = []
for aya in quran.get_sura(suraNumber):
frequency_ayat_list.append(get_frequency(aya))
return frequency_ayat_list
def get_unique_words():
"""retuerns a set of all unique words in Quran
TODO:
need to support suras as well.
"""
# Unique words
words_set = set()
for i in range(1, 114+1):
sura = quran.get_sura(i)
for aya in sura:
wordsList = aya.split(' ')
for word in wordsList:
words_set.add(word)
return words_set
def get_words():
"""returns a list of all words in Quran
TODO:
need to support suras as well.
"""
# words
words_list = list()
for i in range(1, 114+1):
sura = quran.get_sura(i)
for aya in sura:
wordsList = aya.split(' ')
for word in wordsList:
words_list.append(word)
return words_list
def frequency_quran_level():
"""Compute the words frequences of the Quran.
Returns:
[sura_level_frequency_dict]: Revise the output of frequency_sura_level.
"""
# * A list of sura level frequencies.
# * Each element is a list of ayat el-sura frequencies.
quranWordsFrequences = []
for suraNumber in range(1, 114 +1):
suraWordsFrequeces = frequency_sura_level(suraNumber)
quranWordsFrequences.append(suraWordsFrequeces)
return quranWordsFrequences
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = etree.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ")
def quran_words_frequences_data(fileName):
"""Generate the entire words frequences of Quran into XML or JSON
ToDo:
Sould support JSONs as well.
"""
# Computing unique words
unique_words = get_unique_words()
comma_separated_unique_words = ''
for word in unique_words:
comma_separated_unique_words += word + ','
# Removing the extra commas
comma_separated_unique_words = comma_separated_unique_words.strip(',')
# * Creating quran_words_frequences_data -- the root tag
root = Element('quran_words_frequences')
root.set('unique_words', comma_separated_unique_words)
# * Add root to the tree
tree = ElementTree(root)
for suraNumber in range(1, 114 +1):
sura = quran.get_sura(suraNumber)
# * Creating sura Tag
suraTag = Element('sura')
# * set number attribute
suraTag.set('number', str(suraNumber))
# * set sura unique words
# ??? update get_unique_words
# suraTag.set('sura_unique_words', suraUniquewords)
ayaCounter = 1
for aya in sura:
# Create aya Tag
ayaTag = Element('aya')
ayaTag.set('number', str(ayaCounter))
# * Computes the words frequency for aya
ayaWordsDict = get_frequency(aya)
words_comma_separated = ''
occurrence_comma_separated = ''
for word in ayaWordsDict:
words_comma_separated += word + ','
occurrence_comma_separated += str(ayaWordsDict[word]) + ','
# * The same order
words_comma_separated = words_comma_separated.strip(',')
occurrence_comma_separated = occurrence_comma_separated.strip(',')
# * Add words & frequencies attributes
ayaTag.set('unique_words', words_comma_separated)
ayaTag.set('unique_words_frequencies', occurrence_comma_separated)
# * Add aya tag to sura tag
suraTag.append(ayaTag)
ayaCounter += 1
# * add suraTag to the root
root.append(suraTag)
# print(prettify(root))
file = open(fileName, 'w')
file.write(prettify(root))
file.close()
================================================
FILE: documentation/TODO
================================================
1. Add Letter definitions (It is a one char letter or a more-than-one-letter
considered as one letter)
2. Making the import q.arabic and q.analysis for arabic and analysis tools.
3. struct of tashkell instead of wirting them.
================================================
FILE: documentation/__init__.py
================================================
================================================
FILE: documentation/auto_gen_docs.py
================================================
#!/usr/local/bin/python3
from sys import path
import os
# The current path of the current module.
path_current_module = os.path.dirname(os.path.abspath(__file__))
tools_modules = '../tools/'
tools_path = os.path.join(path_current_module, tools_modules)
core_modules = '../core/'
core_path = os.path.join(path_current_module,core_modules)
path.append(tools_path)
# Adding another searching path
path.append(core_path)
import pyquran
import quran
import arabic
import re
import inspect
import shutil
import sys
# {{autogenerated}}
'''NOTES
1 * All files MUST have {{autogenerated}} flag.
2 * foo(arg1:type) not supported yet; use foo(arg1)!
'''
PAGES = [
{
'page': 'quran_tools.md',
'functions': [
quran.get_sura,#
quran.get_verse,#
quran.get_sura_number,#
quran.get_sura_name,#
]
},
{
'page': 'arabic_tools.md',
'functions': [
pyquran.check_system, #
pyquran.factor_alef_mad, #
pyquran.grouping_letter_diacritics,#
arabic.alphabet_excluding,#
arabic.strip_tashkeel,#
pyquran.buckwalter_transliteration,#
]},
{
'page': 'analysis_tools.md',
'functions': [
pyquran.count_rasm,#
pyquran.search_string_with_tashkeel,#
pyquran.frequency_of_character,#
pyquran.frequency_sura_level,#
pyquran.frequency_quran_level,#
pyquran.sort_dictionary_by_similarity,#
pyquran.check_sura_with_frequency,
pyquran.search_sequence,
]},
]
ROOT = 'https://github.com/TahaMagdy/PyQuran'
def get_earliest_class_that_defined_member(member, cls):
ancestors = get_classes_ancestors([cls])
result = None
for ancestor in ancestors:
if member in dir(ancestor):
result = ancestor
if not result:
return cls
return result
def get_classes_ancestors(classes):
ancestors = []
for cls in classes:
ancestors += cls.__bases__
filtered_ancestors = []
for ancestor in ancestors:
if ancestor.__name__ in ['object']:
continue
filtered_ancestors.append(ancestor)
if filtered_ancestors:
return filtered_ancestors + get_classes_ancestors(filtered_ancestors)
else:
return filtered_ancestors
def get_function_signature(function, method=True):
signature = inspect.getargspec(function)
defaults = signature.defaults
if method:
args = signature.args[1:]
else:
args = signature.args
if defaults:
kwargs = zip(args[-len(defaults):], defaults)
args = args[:-len(defaults)]
else:
kwargs = []
st = '%s.%s(' % (function.__module__, function.__name__)
for a in args:
st += str(a) + ', '
for a, v in kwargs:
if isinstance(v, str):
v = '\'' + v + '\''
st += str(a) + '=' + str(v) + ', '
if kwargs or args:
return st[:-2] + ')'
else:
return st + ')'
def get_class_signature(cls):
try:
class_signature = get_function_signature(cls.__init__)
class_signature = class_signature.replace('__init__', cls.__name__)
except:
# in case the class inherits from object and does not
# define __init__
class_signature = cls.__module__ + '.' + cls.__name__ + '()'
return class_signature
def class_to_docs_link(cls):
module_name = cls.__module__
assert module_name[:6] == 'keras.'
module_name = module_name[6:]
link = ROOT + module_name.replace('.', '/') + '#' + cls.__name__.lower()
return link
def class_to_source_link(cls):
module_name = cls.__module__
assert module_name[:6] == 'core/pyquran.'
path = module_name.replace('.', '/')
path += '.py'
line = inspect.getsourcelines(cls)[-1]
link = 'https://github.com/TahaMagdy/PyQuran' + path + '#L' + str(line)
return '[[source]](' + link + ')'
def code_snippet(snippet):
result = '```python\n'
result += snippet + '\n'
result += '```\n'
return result
def process_class_docstring(docstring):
docstring = re.sub(r'\n # (.*)\n',
r'\n __\1__\n\n',
docstring)
docstring = re.sub(r' ([^\s\\]+):(.*)\n',
r' - __\1__:\2\n',
docstring)
docstring = docstring.replace(' ' * 5, '\t\t')
docstring = docstring.replace(' ' * 3, '\t')
docstring = docstring.replace(' ', '')
return docstring
def process_function_docstring(docstring):
docstring = re.sub(r' # (.*)\n',
r'\n __\1__\n\n',
docstring)
docstring = re.sub(r'What it does\n',
r'__What it does__\n\n',
docstring)
docstring = re.sub(r'Args:\n',
r'\n__Args__\n\n',
docstring)
docstring = re.sub(r'Note:\n',
r'\n__Note__\n\n',
docstring)
docstring = re.sub(r'Returns:\n',
r'\n__Returns__\n\n',
docstring)
docstring = re.sub(r'Assumption:\n',
r'\n__Assumption__\n\n',
docstring)
docstring = re.sub(r'Cases:\n',
r'\n__Cases__\n\n',
docstring)
docstring = re.sub(r'Issue:\n',
r'\n__Issue__\n\n',
docstring)
docstring = re.sub(r'Example:\n',
r'\n__Example__\n\n',
docstring)
docstring = re.sub(r' ([^\s\\]+):(.*)\n',
r'\n - __\1__:\2\n',
docstring)
docstring = docstring.replace(' ' * 6, '\t\t')
docstring = docstring.replace(' ' * 4, '\t')
docstring = docstring.replace(' ', '')
return docstring
print('Cleaning up existing sources directory.')
if os.path.exists('sources'):
shutil.rmtree('sources')
print('Populating sources directory with templates.')
for subdir, dirs, fnames in os.walk('templates'):
for fname in fnames:
new_subdir = subdir.replace('templates', 'sources')
if not os.path.exists(new_subdir):
os.makedirs(new_subdir)
if fname[-3:] == '.md':
fpath = os.path.join(subdir, fname)
new_fpath = fpath.replace('templates', 'sources')
shutil.copy(fpath, new_fpath)
# Take care of index page.
#readme = open('README.md').read()
#index = open('index.md').read()
#index = index.replace('{{autogenerated}}', readme[readme.find('##'):])
#f = open('index.md', 'w')
#f.write(index)
#f.close()
print('Starting autogeneration.')
'''
for page_data in PAGES:
blocks = []
classes = page_data.get('classes', [])
for module in page_data.get('all_module_classes', []):
module_classes = []
for name in dir(module):
if name[0] == '_' or name in EXCLUDE:
continue
module_member = getattr(module, name)
if inspect.isclass(module_member):
cls = module_member
if cls.__module__ == module.__name__:
if cls not in module_classes:
module_classes.append(cls)
module_classes.sort(key=lambda x: id(x))
classes += module_classes
for cls in classes:
subblocks = []
signature = get_class_signature(cls)
subblocks.append('' + class_to_source_link(cls) + '')
subblocks.append('### ' + cls.__name__ + '\n')
subblocks.append(code_snippet(signature))
docstring = cls.__doc__
if docstring:
subblocks.append(process_class_docstring(docstring))
blocks.append('\n'.join(subblocks))
'''
for page_data in PAGES:
blocks = []
functions = page_data.get('functions', [])
for module in page_data.get('all_module_functions', []):
module_functions = []
for name in dir(module):
if name[0] == '_' or name in EXCLUDE:
continue
module_member = getattr(module, name)
if inspect.isfunction(module_member):
function = module_member
if module.__name__ in function.__module__:
if function not in module_functions:
module_functions.append(function)
module_functions.sort(key=lambda x: id(x))
functions += module_functions
for function in functions:
subblocks = []
# TEST
print(function)
signature = get_function_signature(function, method=False)
signature = signature.replace(function.__module__ + '.', '')
subblocks.append('### ' + function.__name__ + '\n')
subblocks.append(code_snippet(signature))
docstring = function.__doc__
if docstring:
subblocks.append(process_function_docstring(docstring))
blocks.append('\n\n'.join(subblocks))
if not blocks:
raise RuntimeError('Found no content for page ' +
page_data['page'])
mkdown = '\n----\n\n'.join(blocks)
# save module page.
# Either insert content into existing page,
# or create page otherwise
page_name = page_data['page']
path = os.path.join('docs/', page_name)
# '''
if os.path.exists(path):
template = open(path).read()
assert '{{autogenerated}}' in template, ('Template found for ' + path +
' but missing {{autogenerated}} tag.')
mkdown = template.replace('{{autogenerated}}', mkdown)
print('...inserting autogenerated content into template:', path)
else:
print('...creating new page with autogenerated content:', path)
# '''
print('...creating new page with autogenerated content:', path)
subdir = os.path.dirname(path)
'''
if not os.path.exists(subdir):
os.makedirs(subdir)
'''
open(path, 'w').write(mkdown)
================================================
FILE: documentation/docs/Alphabetical-Systems.md
================================================
What do we mean by Alphabetical Systems?!
================================================
FILE: documentation/docs/CONTRIBUTING.md
================================================
Contributing to PyQuran
=======================
We use GitHub issues for reporting bugs and for feature requests.
If you want to give us a hand, you may pick one of the opened issues and solve a bug, implement a feature request
or to suggest a new missing feature.
## Reporting issues
When reporting a bug, use GitHub issue with the **Bug label**, please include as
much details as possible about:
- your operating system.
- your python version.
- a self-contained code to reproduce and demonstrate the Bug.
**Issue will be closed if the Bug cannot be reproduced.**
## Feature Request
Whenever you think PyQuran is missing a feature, create a GitHub issue with **Feature Request label**,
define what you want precisely and include sufficient examples to cover all the new feature aspects.
If you would like to implement it by yourself, please read the [Contributing Code](#contributing-code) section.
## Code Contribution
Your code have to meet [these standartds](code_conventions.md).
## Contributing Flow
At first, fork the project on [GitHub](https://github.com/TahaMagdy/PyQuran/),
then, create a *feature branch* and start writing your changes.
We **DO NOT** accept changes to the *master branch*.
Once you are done, push the changes to *your feature branch*, after that create a *pull request*
with an expressive title and description.
## Commit Messages
**It is so important to commit properly**, we expect you to commit every one logical change.
A commit message should describe what have been changed, why, and reference issues fixed (if
any).
**Commit Message Properties**:
1. The Fist line is the commit title, should be less then or equal 50 characters, it must be expressive.
2. Keep the second line blank.
3. Wrap all other lines in the message body at 80 columns.
4. Include `Fixes #N`, where _N_ is the issue number the commit
fixes, if any.
Commits should look like the following:
```text
explain commit in one line
Body of commit message is a few lines of text, explaining things
in more detail, possibly giving some background about the issue
being fixed, etc.
The body of the commit message **can be several paragraphs**, and
please do proper word-wrap and keep columns shorter than about
80 characters.
Fixes #101
```
## Unit Tests
We write a test module for every PyQuran module under `PyQuran/testing`.
**Naming**
If the module is called *X*, then its testing module is called *test_X*.
*test_x* must have tough unit tests for every single function.
**Note** it is inevitable to run all testing modules before you make any pull
request. Pull Requests will not be accepted if there is one fail in testing
modules. So, please run them all first.
================================================
FILE: documentation/docs/FAQ.md
================================================
Hello!
================================================
FILE: documentation/docs/Filtering-Special-Recitation-Symbols.md
================================================
# Quran Corpus
We use the *Uthmani Text* of Quran from [tanzil](http://tanzil.net/docs/download).
This is the its hashing ```MD5 (quran-uthmani.xml) = 6aae945d556a1b28cfe682c0ea5ab518```.
# Recitation Symbols
Quran is written in Arabic Alphabet, but the *Quran scholars* have put
some marks to help reciters and readers in pronouncing and give them some guidance like the kind of
some letters and pause marks.
Those are the unique characters in the corpus.
((Table Unicode | Symbol | Kind {letter/mark}))
# Filtering Recitation Symbols
While fetching from corpus, we run the following method to remove all
the recitation marks **they are NOT letters**.
The only thing we replace, is the Alef wasl: ٱ, we add Alef: ا instead, because alef wasl and alef are the same
one letter in Arabic, but alef wasl has a mark above it to indicate that it is not pronounced
as a glottal stop in case of continuing, [Read more about Alef Wasl](https://en.wikipedia.org/wiki/Hamza#Hamzat_wa%E1%B9%A3l).
This filtering is done in run time. We **do not** change the corpus at all.
**[source](https://github.com/hci-lab/PyQuran-Private/blob/master/tools/filtering.py#L107:#L134)**
> Also feel free to report any bugs or lingual errors, you are most welcome, just
> open an [issue](https://github.com/hci-lab/PyQuran/issues).
================================================
FILE: documentation/docs/Home.md
================================================
* [FAQ](https://github.com/TahaMagdy/PyQuran/wiki/FAQ) — answers to frequently asked questions
## Documentation
This is suitable for the *PyQuran* users.
## Development
This section is for *PyQuran* maintainers.
- ## Project Structure
*PyQuran* is organized as the following:
- **core**: contains main functions/modules.
- **tools**: contains helper functions/modules.
- **testing**: contains unit tests for each module.
- **QuranCorpus**: contains Quran corpus and corpus hashes.
```
.
│ README.md
│ setup.py
| __init__.py
| ...
|
└───core
│ │ pyquran.py
│ │ ...
|
└───tools
| │ filtering.py
| | ...
│
└───testing
| │ test_filtering.py
| | ...
│
└───QuranCorpus
│ quran-uthmani.xml
| ...
```
================================================
FILE: documentation/docs/PyQuran-Founders.md
================================================
# Graduation Project
# Contacts
Waleed A. Yousef, Ph.D. [wyousef at fci dot Helwan dot edu dot eg]()
Taha Magdy: tahamagdy@fci.helwan.edu.eg
Umar Mohammed: umar.ibrahime@fci.helwan.edu.eg
================================================
FILE: documentation/docs/Wiki-Home.md
================================================
### Package Structure
*PyQuran* is organized as the following:
- **core**: contains main functions/modules.
- **tools**: contains helper functions/modules.
- **testing**: contains unit tests for each module.
- **QuranCorpus**: contains Quran corpus and corpus hashes.
```
.
│ README.md
│ setup.py
| __init__.py
| ...
|
└───core
│ │ pyquran.py
│ │ ...
|
└───tools
| │ filtering.py
| | ...
│
└───testing
| │ test_filtering.py
| | ...
│
└───QuranCorpus
│ quran-uthmani.xml
| ...
```
================================================
FILE: documentation/docs/analysis_tools.md
================================================
### count_rasm
```python
count_rasm(text, system=None)
```
counts the occerences of each letter (As `system` defines) in sura.
__Args__
- __text__: [str], a list of strings , each inner list is ayah .
- __system__: Optional, [[char]], revise [Alphabetical Systems](#alphabetical-systems),
if `system` is not passed, the normal alphabet is applied.
__Returns__
(N * P) ndarray (Matrix A): N is the number of verses, P is the alphabet (as defined in `system`).
`A[i][j]` is the number of the letter `j` in the verse `i`.
__Example__
```python
newSystem = [[q.beh, q.teh, q.theh], [q.jeem, q.hah, q.khah]]
q.count_rasm(q.quran.get_sura(110), newSystem)
>>>[[1 2 1 0 0 0 1 0 4 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 3 0 1 1 1 0 0]
[1 2 0 0 2 0 0 0 5 0 2 0 1 0 1 0 0 0 0 0 0 0 2 0 0 4 0 3 1 3 1 3]
[6 2 0 0 0 0 1 0 4 0 1 0 2 0 2 0 0 0 0 0 0 1 2 0 2 0 1 2 2 2 0 0]]
```
----
### search_string_with_tashkeel
```python
search_string_with_tashkeel(string, key)
```
__Args__
- __string__: str, sentence to search by key.
- __key__: str, taskeel pattern.
__Assumption__
Searches tashkeel that is exciplitly included in string.
__Returns__
- __find__: list of pairs where x and y are the start and end index of the matched.
- __nod-found__: []
__Example__
```python
string = 'صِفْ ذَاْ ثَنَاْ كَمْ جَاْدَ شَخْصٌ'
q.search_string_with_tashkeel(string, 'َْ')
>>> [(3, 5), (7, 9), (10, 12), (13, 15), (17, 19)]
```
----
### frequency_of_character
```python
frequency_of_character(characters, verse=None, chapterNum=0, verseNum=0, with_tashkeel=False)
```
counts the number of characters in a specific verse or sura or even the entrire Quran ,
__Note__
If verse and chapterNum is not passed, the entire Quran is targeted
__Args__
- __verse__: str, this verse that you need to count it and default is None.
chapterNum, int, chapter number is a number of 'sura' that will count it , and default is 0.
- __verseNum__: int, verse number in sura.
- __chracters__: [], list of characters that you want to count them.
- __with_tashkeel__: Bool, to check if you want to search with tashkeel.
__Returns__
{dic} : {str : int} a dictionary and keys is a characters
and value is count of every chracter.
__Example__
```python
q.frequency_of_character(['أ',"ب","تُ"],verseNum=2,with_tashkeel=False)
#that will count the vers number **2** in all swar
>>> {'أ': 101, 'ب': 133, 'تُ': 0}
q.frequency_of_character(['أ',"ب","تُ"],chapterNum=1,verseNum=2,with_tashkeel=False)
#that will count the vers number **2** in chapter **1**
>>> {'أ': 0, 'ب': 1, 'تُ': 0}
q.frequency_of_character(['أ',"ب","تُ"],chapterNum=1,verseNum=2,with_tashkeel=False)
#that will count in **all Quran**
>>> {'أ': 8900, 'ب': 11491, 'تُ': 2149}
```
----
### frequency_sura_level
```python
frequency_sura_level(suraNumber)
```
Computes the frequency dictionary for a sura
__Args__
- __suraNumber__: 1 <= Int <= 114.
- __Return__:
- __[aya_frequency_dictionary]__: the key of `aya_frequency_dictionary` is a
unique word in aya and the corresponding value is its frequency.
A list of frequency dictionaries for each verse of Sura.
__Note__
* frequency dictionary is a python dict, which carries word frequencies
for an aya.
* Its key is (str) word, its value is (int) word frequency
__Example__
```python
q.frequency_sura_level(suraNumber=1)
>>> [{بسم': 1, 'الله': 1, 'الرحمن': 1, 'الرحيم': 1'},
{الحمد': 1, 'لله': 1, 'رب': 1, 'العلمين': 1'},
{الرحمن': 1, 'الرحيم': 1'},
{ملك': 1, 'يوم': 1, 'الدين': 1'},
{إياك': 1, 'نعبد': 1, 'وإياك': 1, 'نستعين': 1'},
{اهدنا': 1, 'الصرط': 1, 'المستقيم': 1'},
{عليهم': 2',
صرط': 1',
الذين': 1',
أنعمت': 1',
غير': 1',
المغضوب': 1',
ولا': 1',
الضالين': 1'}]
```
----
### frequency_quran_level
```python
frequency_quran_level()
```
Compute the words frequences of the Quran.
__Returns__
- __[sura_level_frequency_dict]__: Revise the output of frequency_sura_level.
----
### sort_dictionary_by_similarity
```python
sort_dictionary_by_similarity(frequency_dictionary, threshold=0.8)
```
this function using to cluster words using similarity
and sort every bunch of word by most common and sort bunches
descending in same time
__Args__
- __frequency_dictionary__: dict, frequency dictionary to be sorted.
__Returns__
dict : {str: int} sorted dictionary
__Example__
```python
frequency_dic = q.generate_frequency_dictionary(114)
q.sort_dictionary_by_similarity(frequency_dic)
# this dictionary is sorted using similarity 0.8
>>> {'أعوذ': 1, 'إذا': 2, 'العقد': 1, 'الفلق': 1, 'النفثت': 1, 'برب': 1, 'حاسد': 1, 'حسد': 1, 'خلق': 1, 'شر': 4, 'غاسق': 1, 'فى': 1, 'قل': 1, 'ما': 1, 'من': 1, 'وقب': 1, 'ومن': 3}
```
----
### check_sura_with_frequency
```python
check_sura_with_frequency(sura_num, freq_dec)
```
this function check if frequency dictionary of specific sura is
compatible with original sura in shapes count
__Args__
suraNumber (int): sura number
__Returns__
- __Boolean__: True :- if compatible
Flase :- if not
__Example__
```python
frequency_dic = q.generate_frequency_dictionary(114)
q.check_sura_with_frequency(114, frequency_dic)
>>> True
```
----
### search_sequence
```python
search_sequence(sequancesList, verse=None, chapterNum=0, verseNum=0, mode=3)
```
take list of sequances and return matched sequance, it search in verse ot
chapter or All Quran ,
it return for every match :
1 - matched sequance
2 - chapter number of occurrence
3 - token number if word and 0 if sentence
Note :
- if found verse != None it will use it en search .
- if no verse and found chapterNum and verseNum it will
- use this verse and use it to search.
- if no verse and no verseNum and found chapterNum it will
- search in chapter.
- if no verse and no chapterNum and no verseNum it will
search in All Quran.
it has many modes:
- search with decorated sequance (with tashkeel),
and return matched sequance with decorates (with tashkil).
- search without decorated sequance (without tashkeel),
and return matched sequance without decorates (without tashkil).
- search without decorated sequance (without tashkeel),
and return matched sequance with decorates (with tashkil).
__Args__
- __chapterNum__: int, number of chapter where function search.
- __verseNum__: int, number of verse wher function search.
- __sequancesList__: [], a list of sequances that you want to match them.
- __mode__: int, this mode that you need to use and default mode 3.
__Returns__
- __dict__: key is sequances and value is a list of matched_sequance and their positions.
__Example__
```python
# search in chapter = 1 only using mode 3 (default)
q.search_sequence(sequancesList=['ملك يوم الدين'],chapterNum=1)
#it will return
#{'sequance-1' : [ (matched_sequance , position , vers_num , chapter_num) , (....) ],
# 'sequance-2' : [ (matched_sequance , position , vers_num , chapter_num) , (....) ] }
# Note : position == 0 if sequance is a sentence and == word position if sequance is a word
>>> {'ملك يوم الدين': [('مَلِكِ يَوْمِ الدِّينِ', 0, 4, 1)]}
# search in all Quran using mode 3 (default)
q.search_sequence(sequancesList=['ملك يوم'])
>>> {'ملك يوم': [('مَلِكِ يَوْمِ', 0, 4, 1), ('الْمُلْكُ يَوْمَ', 0, 73, 6), ('الْمُلْكُ يَوْمَئِذٍ', 0, 56, 22), ('الْمُلْكُ يَوْمَئِذٍ', 0, 26, 25)]}
```
================================================
FILE: documentation/docs/arabic_tools.md
================================================
## Alphabets
We use [PyArabic](https://pypi.python.org/pypi/PyArabic/0.6.2) constants which
represents letters, instead of writting Arabic in the code.
```python
hamza = u'\u0621'
alef_mad = u'\u0622'
alef_hamza_above = u'\u0623'
waw_hamza = u'\u0624'
alef_hamza_below = u'\u0625'
yeh_hamza = u'\u0626'
alef = u'\u0627'
beh = u'\u0628'
teh_marbuta = u'\u0629'
teh = u'\u062a'
theh = u'\u062b'
jeem = u'\u062c'
hah = u'\u062d'
khah = u'\u062e'
dal = u'\u062f'
thal = u'\u0630'
reh = u'\u0631'
zain = u'\u0632'
seen = u'\u0633'
sheen = u'\u0634'
sad = u'\u0635'
dad = u'\u0636'
tah = u'\u0637'
zah = u'\u0638'
ain = u'\u0639'
ghain = u'\u063a'
feh = u'\u0641'
qaf = u'\u0642'
kaf = u'\u0643'
lam = u'\u0644'
meem = u'\u0645'
noon = u'\u0646'
heh = u'\u0647'
waw = u'\u0648'
alef_maksura = u'\u0649'
yeh = u'\u064a'
madda_above = u'\u0653'
hamza_above = u'\u0654'
hamza_below = u'\u0655'
alef_wasl = u'\u0671'
```
## Alphabetical Systems (Definitions)
[**Rasm**](https://en.wikipedia.org/wiki/Rasm): is any set of letters which are
writtern in the same form, namely; they are indistinguishable in wirtting by
they are distinguished from the context. For example, the letters ت ث ن ى,
they can be written with only one rasm ىـ, without dots.
**Alphabetical System**: is a set of rasm; dynamically constructed by
specifying the letters that you will treat them as one rasm. By the way, the
default Arabic alphabet is a special case of the **Alphabetical System** where
each letter is as one rasm.
**Predefined systems** are stored in `systems` object.
1. **Default**: each letter is treated as a unique rasm.
2. **Without Dots**: by removing the dots some letters will be
indistinguishable; those letters are treated as one rasm.
The following example shows the (Without Dots) system as a list of lists;
where the sublist contains the letters which share the same rasm.
3. **Hamazat**: consider each any letter accompanied by hamaz ء as one rasm.
**NOTE**: You may go further and construct your system by speicying what
letters you want to treat as one rasm, then you can do some statistical
analysis like, count, variance, average, ...
Example:
```python
q.systems.withoutDots
Out:
[['ب', 'ت', 'ث', 'ن'], # Rasm 1
['ح', 'خ', 'ج'], # Rasm 2
['د', 'ذ'], # Rasm 3
['ر', 'ز'], # Rasm 4
['س', 'ش'], # Rasm 5
['ص', 'ض'], # Rasm 6
['ط', 'ظ'], # Rasm 7
['ع', 'غ'], # Rasm 8
['ف', 'ق']] # Rasm 9
```
### Constructing a user-defined system:
```python
system = [[alef_hamza_above, alef],
[beh, teh]]
```
The previous piece of code means "Treat *alef_hamza_above* and *alef*
as the same one latter, also treat *beh* and *teh* as one letter as well".
The rest of letters can be dynamically constructed using `check_system()`
And then, a system can be applied to some text analysis functions like counting,
filtering, etc.
### check_system
```python
check_system(system, index=None)
```
Returns the alphabet including treated-as-one letters. If you pass the index as the second optional arguement, it returns the letter of the that index only, not the hole alphabet.
__Args__
- __system__: [[char]], a list of letters, where each letter to be treated as
one letter are in one sub-list, see [Alphabetical Systems](#alphabetical-systems).
- __index__: Optional integer, is a index of a letter in the new system.
__Returns__
- __list__: full sorted system or a specific index.
__Example__
```python
q.check_system([['alef', 'beh']])
>>> [['ء'],
['آ'],
['أ', 'ب'],
['ؤ'],
['إ'],
['ئ'],
['ا'],
['ة'],
['ت'],
['ث'],
['ج'],
['ح'],
['خ'],
['د'],
['ذ'],
['ر'],
['ز'],
['س'],
['ش'],
['ص'],
['ض'],
['ط'],
['ظ'],
['ع'],
['غ'],
['ف'],
['ق'],
['ك'],
['ل'],
['م'],
['ن'],
['ه'],
['و'],
['ى'],
['ي']]
```
The previous example prints each letter as one element in a new alphabet list,
as you can see the two letters alef and beh are considered one letter.
----
### factor_alef_mad
```python
factor_alef_mad(sentance)
```
It returns the `sentance` having alef_mad factored into alef_hamza and alef_wasel.
__Args__
- __sentance__: str, a string or list.
__Returns__
- __str__: sentance having the alef_mad factored
__Example__
```python
q.factor_alef_mad('آ')
>>> 'أْأَ'
```
----
### grouping_letter_diacritics
```python
grouping_letter_diacritics(sentance)
```
Grouping each letter with its diacritics.
__Args__
- __sentance__: str
__Returns__
- __[str]__: a list of _x_, where _x_ is the letter accompanied with its
diacritics.
__Example__
```python
q.grouping_letter_diacritics('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')
>>> ['إِ', 'نَّ', 'ا', ' ', 'أَ', 'عْ', 'طَ', 'يْ', 'نَ', 'كَ', ' ', 'ا', 'لْ', 'كَ', 'وْ', 'ثَ', 'رَ']
```
----
### alphabet_excluding
```python
alphabet_excluding(excludedLetters)
```
returns the alphabet excluding `excludedLetters`.
__Args__
- __excludedLetters__: list[Char], letters to be excluded from the alphabet.
__Returns__
- __str__: alphabet excluding `excludedLetters`.
__Example__
```python
q.alphabet_excluding([q.alef, q.beh, q.qaf, q.teh, q.dal, q.yeh, q.alef_mad])
>>>
['ء',
'ٔ',
'أ',
'ؤ',
'إ',
'ئ',
'ة',
'ث',
'ج',
'ح',
'خ',
'ذ',
'ر',
'ز',
'س',
'ش',
'ص',
'ض',
'ط',
'ظ',
'ع',
'غ',
'ف',
'ك',
'ل',
'م',
'ن',
'ه',
'و',
'ى']
```
----
### strip_tashkeel
```python
strip_tashkeel(string)
```
convert any letter in the `listOfLetter` to `letter` in the given text
__Args__
- __string__: str, to drop tashkeel from.
__Example__
```python
x = q.quran.get_verse(12, 2, with_tashkeel=True)
x
>>> 'إِنَّا أَنزَلْنَهُ قُرْءَنًا عَرَبِيًّا لَّعَلَّكُمْ تَعْقِلُونَ'
q.strip_tashkeel(x)
>>> 'إنا أنزلنه قرءنا عربيا لعلكم تعقلون'
```
----
### buckwalter_transliteration
```python
buckwalter_transliteration(string, reverse=False)
```
Back and forth Arabic-Bauckwalter transliteration.
Revise [Buckwalter](https://en.wikipedia.org/wiki/Buckwalter_transliteration)
__Args__
- __string__: to be transliterated.
- __reverse__: Optional boolean. `False` transliterates from Arabic to
Bauckwalter, `True` transliterates from Bauckwalter to Arabic.
__Returns__
- __str__: transliterated string.
__Example__
```python
q.buckwalter_transliteration('إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ')
>>> aEoTayonaka Alokawovara
```
================================================
FILE: documentation/docs/authors.md
================================================
Authors
=======
- [Dr. Waleed A. Yousef](https://github.com/DrWaleedAYousef), Ph.D., [Human
Computer Interaction Laboratory (HCI Lab.)](http://www.hciegypt.com/main/),
wyousef@fci.helwan.edu.eg.
- [Taha M. Madbouly](https://github.com/TahaMagdy), B.Sc., tahamagdy@fci.helwan.edu.eg.
- [Omar M. Ibrahime](https://github.com/moroclash), B.Sc., umar.ibrahime@fci.helwan.edu.eg
- [Ali H. El-Kassas](https://github.com/Ali-Abdelmonim), B.Sc., alihassan2@fci.helwan.edu.eg
- [Ali O. Hassan](https://github.com/AliOsamaHassan), B.Sc., ali.osama@fci.helwan.edu.eg
- [Abdallah R. Albohy](https://github.com/abdo96), B.Sc. abdoengineer2015@gmail.com
================================================
FILE: documentation/docs/code_conventions.md
================================================
Code Conventions
================
This helps everyone to read and maintain the code even when they maintains
someone else code
**Please restrict to the rules.**
## Rules
* A line **must not** exceed *80 character* length.
* Use **Spaces** not **Tabs**.
* Always return to `example_google.py` file.
* We dissagree with `example_goole.py` in variables naming ONLY,
and **we agree with it in the whole entire rest**.
## Naming
* **Class Name**: [PascalCase](https://en.wikipedia.org/wiki/PascalCase): initial letter is **upper case**
* *Examples*: `Class, NewClass, ...`
* **Function**: [snake_case](https://en.wikipedia.org/wiki/Snake_case): Lowercase underscore-separated names.
* *Examples*: `foo, foo_name, ...`
* **Variables**: [lowerCamelCase](https://en.wikipedia.org/wiki/Camel_case): initial letter is **lower case** and rest are PascalCasee.
* *Examples*: `variable, varibaleName, ...`
## Function prototype
* Functions should have a description followed by sections as in the following example.
* You don't need to include all section, but include what makes the function as clear as possible.
* **Function prototypes also used for proposed functions**.
```python
def function_with_types_in_docstring(param1, param2):
"""Here you write a rigorous description of the function
Args:
param1 (int): The first parameter.
param2 (str): The second parameter.
Returns:
bool: The return value. True for success, False otherwise.
Note:
Do not include the `self` parameter in the ``Args`` section.
"""
# Empty Line
pass # in case it is just a prototype (not implemented yet)
```
# Google Standards Example
```python
# -*- coding: utf-8 -*-
"""Example Google style docstrings.
This module demonstrates documentation as specified by the `Google Python
Style Guide`_. Docstrings may extend over multiple lines. Sections are created
with a section header and a colon followed by a block of indented text.
Example:
Examples can be given using either the ``Example`` or ``Examples``
sections. Sections support any reStructuredText formatting, including
literal blocks::
$ python example_google.py
Section breaks are created by resuming unindented text. Section breaks
are also implicitly created anytime a new section starts.
Attributes:
module_level_variable1 (int): Module level variables may be documented in
either the ``Attributes`` section of the module docstring, or in an
inline docstring immediately following the variable.
Either form is acceptable, but the two should not be mixed. Choose
one convention to document module level variables and be consistent
with it.
Todo:
* For module TODOs
* You have to also use ``sphinx.ext.todo`` extension
.. _Google Python Style Guide:
http://google.github.io/styleguide/pyguide.html
"""
module_level_variable1 = 12345
module_level_variable2 = 98765
"""int: Module level variable documented inline.
The docstring may span multiple lines. The type may optionally be specified
on the first line, separated by a colon.
"""
def function_with_types_in_docstring(param1, param2):
"""Example function with types documented in the docstring.
`PEP 484`_ type annotations are supported. If attribute, parameter, and
return types are annotated according to `PEP 484`_, they do not need to be
included in the docstring:
Args:
param1 (int): The first parameter.
param2 (str): The second parameter.
Returns:
bool: The return value. True for success, False otherwise.
.. _PEP 484:
https://www.python.org/dev/peps/pep-0484/
"""
def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
"""Example function with PEP 484 type annotations.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
The return value. True for success, False otherwise.
"""
def module_level_function(param1, param2=None, *args, **kwargs):
"""This is an example of a module level function.
Function parameters should be documented in the ``Args`` section. The name
of each parameter is required. The type and description of each parameter
is optional, but should be included if not obvious.
If \*args or \*\*kwargs are accepted,
they should be listed as ``*args`` and ``**kwargs``.
The format for a parameter is::
name (type): description
The description may span multiple lines. Following
lines should be indented. The "(type)" is optional.
Multiple paragraphs are supported in parameter
descriptions.
Args:
param1 (int): The first parameter.
param2 (:obj:`str`, optional): The second parameter. Defaults to None.
Second line of description should be indented.
*args: Variable length argument list.
**kwargs: Arbitrary keyword arguments.
Returns:
bool: True if successful, False otherwise.
The return type is optional and may be specified at the beginning of
the ``Returns`` section followed by a colon.
The ``Returns`` section may span multiple lines and paragraphs.
Following lines should be indented to match the first line.
The ``Returns`` section supports any reStructuredText formatting,
including literal blocks::
{
'param1': param1,
'param2': param2
}
Raises:
AttributeError: The ``Raises`` section is a list of all exceptions
that are relevant to the interface.
ValueError: If `param2` is equal to `param1`.
"""
if param1 == param2:
raise ValueError('param1 may not be equal to param2')
return True
def example_generator(n):
"""Generators have a ``Yields`` section instead of a ``Returns`` section.
Args:
n (int): The upper limit of the range to generate, from 0 to `n` - 1.
Yields:
int: The next number in the range of 0 to `n` - 1.
Examples:
Examples should be written in doctest format, and should illustrate how
to use the function.
>>> print([i for i in example_generator(4)])
[0, 1, 2, 3]
"""
for i in range(n):
yield i
class ExampleError(Exception):
"""Exceptions are documented in the same way as classes.
The __init__ method may be documented in either the class level
docstring, or as a docstring on the __init__ method itself.
Either form is acceptable, but the two should not be mixed. Choose one
convention to document the __init__ method and be consistent with it.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
msg (str): Human readable string describing the exception.
code (:obj:`int`, optional): Error code.
Attributes:
msg (str): Human readable string describing the exception.
code (int): Exception error code.
"""
def __init__(self, msg, code):
self.msg = msg
self.code = code
class ExampleClass(object):
"""The summary line for a class docstring should fit on one line.
If the class has public attributes, they may be documented here
in an ``Attributes`` section and follow the same formatting as a
function's ``Args`` section. Alternatively, attributes may be documented
inline with the attribute's declaration (see __init__ method below).
Properties created with the ``@property`` decorator should be documented
in the property's getter method.
Attributes:
attr1 (str): Description of `attr1`.
attr2 (:obj:`int`, optional): Description of `attr2`.
"""
def __init__(self, param1, param2, param3):
"""Example of docstring on the __init__ method.
The __init__ method may be documented in either the class level
docstring, or as a docstring on the __init__ method itself.
Either form is acceptable, but the two should not be mixed. Choose one
convention to document the __init__ method and be consistent with it.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
param1 (str): Description of `param1`.
param2 (:obj:`int`, optional): Description of `param2`. Multiple
lines are supported.
param3 (:obj:`list` of :obj:`str`): Description of `param3`.
"""
self.attr1 = param1
self.attr2 = param2
self.attr3 = param3 #: Doc comment *inline* with attribute
#: list of str: Doc comment *before* attribute, with type specified
self.attr4 = ['attr4']
self.attr5 = None
"""str: Docstring *after* attribute, with type specified."""
@property
def readonly_property(self):
"""str: Properties should be documented in their getter method."""
return 'readonly_property'
@property
def readwrite_property(self):
""":obj:`list` of :obj:`str`: Properties with both a getter and setter
should only be documented in their getter method.
If the setter method contains notable behavior, it should be
mentioned here.
"""
return ['readwrite_property']
@readwrite_property.setter
def readwrite_property(self, value):
value
def example_method(self, param1, param2):
"""Class methods are similar to regular functions.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
True if successful, False otherwise.
"""
return True
def __special__(self):
"""By default special members with docstrings are not included.
Special members are any methods or attributes that start with and
end with a double underscore. Any special member with a docstring
will be included in the output, if
``napoleon_include_special_with_doc`` is set to True.
This behavior can be enabled by changing the following setting in
Sphinx's conf.py::
napoleon_include_special_with_doc = True
"""
pass
def __special_without_docstring__(self):
pass
def _private(self):
"""By default private members are not included.
Private members are any methods or attributes that start with an
underscore and are *not* special. By default they are not included
in the output.
This behavior can be changed such that private members *are* included
by changing the following setting in Sphinx's conf.py::
napoleon_include_private_with_doc = True
"""
pass
def _private_without_docstring(self):
pass
```
================================================
FILE: documentation/docs/dictFrec.md
================================================
Comming soon.
================================================
FILE: documentation/docs/example_google.md
================================================
```python
# -*- coding: utf-8 -*-
"""Example Google style docstrings.
This module demonstrates documentation as specified by the `Google Python
Style Guide`_. Docstrings may extend over multiple lines. Sections are created
with a section header and a colon followed by a block of indented text.
Example:
Examples can be given using either the ``Example`` or ``Examples``
sections. Sections support any reStructuredText formatting, including
literal blocks::
$ python example_google.py
Section breaks are created by resuming unindented text. Section breaks
are also implicitly created anytime a new section starts.
Attributes:
module_level_variable1 (int): Module level variables may be documented in
either the ``Attributes`` section of the module docstring, or in an
inline docstring immediately following the variable.
Either form is acceptable, but the two should not be mixed. Choose
one convention to document module level variables and be consistent
with it.
Todo:
* For module TODOs
* You have to also use ``sphinx.ext.todo`` extension
.. _Google Python Style Guide:
http://google.github.io/styleguide/pyguide.html
"""
module_level_variable1 = 12345
module_level_variable2 = 98765
"""int: Module level variable documented inline.
The docstring may span multiple lines. The type may optionally be specified
on the first line, separated by a colon.
"""
def function_with_types_in_docstring(param1, param2):
"""Example function with types documented in the docstring.
`PEP 484`_ type annotations are supported. If attribute, parameter, and
return types are annotated according to `PEP 484`_, they do not need to be
included in the docstring:
Args:
param1 (int): The first parameter.
param2 (str): The second parameter.
Returns:
bool: The return value. True for success, False otherwise.
.. _PEP 484:
https://www.python.org/dev/peps/pep-0484/
"""
def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
"""Example function with PEP 484 type annotations.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
The return value. True for success, False otherwise.
"""
def module_level_function(param1, param2=None, *args, **kwargs):
"""This is an example of a module level function.
Function parameters should be documented in the ``Args`` section. The name
of each parameter is required. The type and description of each parameter
is optional, but should be included if not obvious.
If \*args or \*\*kwargs are accepted,
they should be listed as ``*args`` and ``**kwargs``.
The format for a parameter is::
name (type): description
The description may span multiple lines. Following
lines should be indented. The "(type)" is optional.
Multiple paragraphs are supported in parameter
descriptions.
Args:
param1 (int): The first parameter.
param2 (:obj:`str`, optional): The second parameter. Defaults to None.
Second line of description should be indented.
*args: Variable length argument list.
**kwargs: Arbitrary keyword arguments.
Returns:
bool: True if successful, False otherwise.
The return type is optional and may be specified at the beginning of
the ``Returns`` section followed by a colon.
The ``Returns`` section may span multiple lines and paragraphs.
Following lines should be indented to match the first line.
The ``Returns`` section supports any reStructuredText formatting,
including literal blocks::
{
'param1': param1,
'param2': param2
}
Raises:
AttributeError: The ``Raises`` section is a list of all exceptions
that are relevant to the interface.
ValueError: If `param2` is equal to `param1`.
"""
if param1 == param2:
raise ValueError('param1 may not be equal to param2')
return True
def example_generator(n):
"""Generators have a ``Yields`` section instead of a ``Returns`` section.
Args:
n (int): The upper limit of the range to generate, from 0 to `n` - 1.
Yields:
int: The next number in the range of 0 to `n` - 1.
Examples:
Examples should be written in doctest format, and should illustrate how
to use the function.
>>> print([i for i in example_generator(4)])
[0, 1, 2, 3]
"""
for i in range(n):
yield i
class ExampleError(Exception):
"""Exceptions are documented in the same way as classes.
The __init__ method may be documented in either the class level
docstring, or as a docstring on the __init__ method itself.
Either form is acceptable, but the two should not be mixed. Choose one
convention to document the __init__ method and be consistent with it.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
msg (str): Human readable string describing the exception.
code (:obj:`int`, optional): Error code.
Attributes:
msg (str): Human readable string describing the exception.
code (int): Exception error code.
"""
def __init__(self, msg, code):
self.msg = msg
self.code = code
class ExampleClass(object):
"""The summary line for a class docstring should fit on one line.
If the class has public attributes, they may be documented here
in an ``Attributes`` section and follow the same formatting as a
function's ``Args`` section. Alternatively, attributes may be documented
inline with the attribute's declaration (see __init__ method below).
Properties created with the ``@property`` decorator should be documented
in the property's getter method.
Attributes:
attr1 (str): Description of `attr1`.
attr2 (:obj:`int`, optional): Description of `attr2`.
"""
def __init__(self, param1, param2, param3):
"""Example of docstring on the __init__ method.
The __init__ method may be documented in either the class level
docstring, or as a docstring on the __init__ method itself.
Either form is acceptable, but the two should not be mixed. Choose one
convention to document the __init__ method and be consistent with it.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
param1 (str): Description of `param1`.
param2 (:obj:`int`, optional): Description of `param2`. Multiple
lines are supported.
param3 (:obj:`list` of :obj:`str`): Description of `param3`.
"""
self.attr1 = param1
self.attr2 = param2
self.attr3 = param3 #: Doc comment *inline* with attribute
#: list of str: Doc comment *before* attribute, with type specified
self.attr4 = ['attr4']
self.attr5 = None
"""str: Docstring *after* attribute, with type specified."""
@property
def readonly_property(self):
"""str: Properties should be documented in their getter method."""
return 'readonly_property'
@property
def readwrite_property(self):
""":obj:`list` of :obj:`str`: Properties with both a getter and setter
should only be documented in their getter method.
If the setter method contains notable behavior, it should be
mentioned here.
"""
return ['readwrite_property']
@readwrite_property.setter
def readwrite_property(self, value):
value
def example_method(self, param1, param2):
"""Class methods are similar to regular functions.
Note:
Do not include the `self` parameter in the ``Args`` section.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
True if successful, False otherwise.
"""
return True
def __special__(self):
"""By default special members with docstrings are not included.
Special members are any methods or attributes that start with and
end with a double underscore. Any special member with a docstring
will be included in the output, if
``napoleon_include_special_with_doc`` is set to True.
This behavior can be enabled by changing the following setting in
Sphinx's conf.py::
napoleon_include_special_with_doc = True
"""
pass
def __special_without_docstring__(self):
pass
def _private(self):
"""By default private members are not included.
Private members are any methods or attributes that start with an
underscore and are *not* special. By default they are not included
in the output.
This behavior can be changed such that private members *are* included
by changing the following setting in Sphinx's conf.py::
napoleon_include_private_with_doc = True
"""
pass
def _private_without_docstring(self):
pass
```
================================================
FILE: documentation/docs/index.md
================================================
# PyQuran: The Python package for Quranic Analysis
PyQuran is a package which provides tools for Quranic Analysis and Arabic texts.
It is still a small package which needs a lot of your effort. We believe that it
is a seed of a fundamental general package for
computations on Quran with Python, even at the most basic level which is simply
retrieving Quran text.
*Before Islam*, Arabic letters were without dots—
[*rasm*](https://en.wikipedia.org/wiki/Rasm), which resulted in ambiguty, two or three
letters had the same rasm or form.
Muslims have decided to remove this ambiguity by adding
dots above or below each letter of the ones which share the same rasm. Now each letter has a unique form. By the way,
originally, Quran was written using letters without dots.
To enable researchers to use modern alphabet, old rasm or other, we introduce *alphabetical systems*,
It is a dynamic construction of letters— Alphabetical Systems.
## Quran Corpus
We use [tanzil](http://tanzil.net/docs/download) Quran Corpus (*Uthmani Text*), it is in `UTF-8` encoding. You
can find all unique characters of Uthmanic Corpus
[here](https://hci-lab.github.io/PyQuran-Private/Filtering-Special-Recitation-Symbols/#recitation-symbols).
There are *special recitation symbols* مصطلحات الضبط in the *Uthmani Text*, they are a guide for the reciter
to know the right positions to pause and the rules of tajweed.
We provide an interface to filter those symbols, *on the fly while fetching from the corpus*,
we **DO NOT** change the corpus, NEVER.
[For the full details about filtering *special recitation symbols* مصطلحات
الضبط.](https://hci-lab.github.io/PyQuran-Private/Filtering-Special-Recitation-Symbols/#recitation-symbols)
## Current Features
- [Quran Retrieving.](https://hci-lab.github.io/PyQuran-Private/quran_tools/)
- Advanced Searching, by
[Text](https://hci-lab.github.io/PyQuran-Private/analysis_tools/#search_sequence)
and [Diacritics](https://hci-lab.github.io/PyQuran-Private/analysis_tools/#search_string_with_tashkeel) Patterns.
- [Buckwalter Transliteration](https://hci-lab.github.io/PyQuran-Private/arabic_tools/#buckwalter_transliteration), back and forth.
- Multiple [Alphabetical Systems](https://hci-lab.github.io/PyQuran-Private/arabic_tools/#alphabetical-systems).
- Words Frequency Table المعجم الترددى للألفاظ .
## PyQuran needs and Upcoming Features.
- Words Frequency Table filtered according to words meaning.
- Morphology analysis of words to their roots.
- Arabic tools for representing Arabic text for AI algorithms and neural
networks, for more serious Arabic text processing and understanding. Those
tools should take meaning, diacritics, roots and other morphology aspects in
account.
- Some PyQuran in-house tools and architecture enhancement will be on GitHub
Issues for you contributors to make PyQuran professional and easy to use.
## Contributing
To contribute and maintain PyQuran, Please read [CONTRIBUTING](https://hci-lab.github.io/PyQuran-Private/CONTRIBUTING) section.
## Dependencies
- [numpy](http://www.numpy.org/)
- [pyarabic](https://github.com/linuxscout/pyarabic)
## Install
- From PyPI: `$ pip3 install pyquran`
## Citing
```
@MISC {PyQuran2018,
author = "Waleed A. Yousef and
Taha M. Madbouly and
Omar M. Ibrahime and
Ali H. El-Kassas and
Ali O. Hassan and
Abdallah R. Albohy",
title = "PyQuran: The Python package for Quranic Analysis",
howpublished = "https://hci-lab.github.io/PyQuran-Private",
year = "2018"}
```
## Communication
[Author Page](https://hci-lab.github.io/PyQuran-Private/authors)
================================================
FILE: documentation/docs/maintainers.md
================================================
================================================
FILE: documentation/docs/methods guide.md
================================================
```python
X
```
X
**Arguments**
- **X**: X
**Example**
```python
X
```
--------
-------------------
# Thbeed
* [Features](#features)
* [Imporatan information](#imporatan-information)
* [Usage](#usage)
* [Functions](#functions)
* [Access functions](#access-functions)
[x] DONE
* [Manipulate functions](#manipulate-functions)
[x] DONE
* [Analysis functions](#analysis-functions)
* [count_shape](#count_shape)
* [count_token](#count_token)
* [frequency_of_character](#frequency_of_character)
* [generate_frequancy_dictionary](#generate_frequancy_dictionary)
* [sort_dictionary_by_similarity](#sort_dictionary_by_similarity)
* [check_sura_with_frequency](#check_sura_with_frequency)
* [generate_latex_table](#generate_latex_table)
* [Search functions](#search-functions)
* [search_sequence](#search_sequence)
* [search_string_with_tashkeel](#search_string_with_tashkeel)
* [search_with_pattern](#search_with_pattern)
# Features
* Access Holy-Quran :
- get **Chapter** with/without diacritics.
- get **Verse** with/without diacritics.
- get **Token** (word).
- get **Chapter name** , **Chapter number**.
- get **Verses number** in verse.
* Manipulate with Holy-Quran :
- Separate to **letters** with/without diacritics.
- Apply your **System** on Quran.
- get **Binary representation** of Holy-Quran as 0's , 1's.
- Extract **Taskill** from sentence.
- Dealing with linguistic rules like :
- Transfer Alef-mad **"آ"** to "أَأْ"
- Convert the **unicode of arabic** text to **buckwalter encoding** and vice versa
- Convert Quran to **buckwalter reprsentation** and vice versa.
* Analysis Holy-Quran:
- get **Frequency Matrix** of letters dependent on Applied _alphabet system_.
- get **Frequency dictionary** of tokens.
- sort **Frequency dictionary** using similarity threshold.
* Search in Holy-Quran using :
- **Text** and ther is a variety options.
- **diacritics pattern**.
- **binary representation pattern** using threshold.
# Functions
## Manipulate functions:
## Analysis functions:
#### count_shape
**count_shape(text, system=None)**
- takes **text** (chapter/verse), **system (optional)** it's the shape of character as example [[bah,gem]] and return a **n*p matrix** where **n** number of verses and **p** number of collections in system and if not pass system it will apply the defualt.
```python
newSystem=[[beh, teh, theh], [jeem, hah, khah]]
alphabetAsOneShape =pq.count_shape(get_sura(110), newSystem)
print(alphabetAsOneShape)
>>> [[1 2 1 0 0 0 1 0 4 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 3 0 1 1 1 0 0]
[1 2 0 0 2 0 0 0 5 0 2 0 1 0 1 0 0 0 0 0 0 0 2 0 0 4 0 3 1 3 1 3]
[6 2 0 0 0 0 1 0 4 0 1 0 2 0 2 0 0 0 0 0 0 1 2 0 2 0 1 2 2 2 0 0]]
```
#### count_token
**count_token(text)**
- takes **text** (chapter/verse) and returns the number of tokens.
###### ***note***: the harf ('و') is not calculated as token alone
```python
numberOfToken=pq.count_token(tools.get_sura(110))
print(numberOfToken)
>>> 19
```
#### frequency_of_character
**frequency_of_character(characters,verse=None,chapterNum=0,verseNum=0,with_tashkeel=False)**
- takes **characters** that you need to count , return dictionary that havecounts characters occurrence for verses or with chapter or even all quran and the dictionary contains the key char and values is an occurrence of character .
- optional opptions:
- **verse** (str): if passed, it will applied to this string only
- **chapterNum** (int) : if passed only, it will applied to this chapter only.
- **verseNum** (int) :
- if passed only, it will applied to **verseNum** for **all Chapters**.
- if passed with **chapterNum**, it will applied to verseNum for **chapterNum**.
- **with_tashkeel** (bool):
- if **True** applied to Quran **with** Tashkieel.
- if **False** applied to Quran **without** Tashkieel.
- Note : if don't pass any **optional opptions** it will applied to all **Quran**.
```python
frequencyOfChar =tools.frequency_of_character(['أ','ب'],'قل أعوذ برب الناس',114,1)
print(frequencyOfChar)
>>> {أ:1,ب:2}
```
#### generate_frequancy_dictionary
**generate_frequency_dictionary(suraNumber=None)**
- takes **suraNumber (optional)** the number of chapter and it returns the dictionary of words contains the **word** as key and its **frequency** as value and if not pass **suraNumber** it will applied to **all-Quran**.
```python
dictionaryFrequency = pq.generate_frequency_dictionary(114)
print(dictionaryFrequency)
>>> {'الناس': 4, 'من': 2, 'قل': 1, 'أعوذ': 1, 'برب': 1, 'ملك': 1, 'إله': 1, 'شر': 1, 'الوسواس': 1, 'الخناس': 1, 'الذى': 1, 'يوسوس': 1, 'فى': 1, 'صدور': 1, 'الجنة': 1, 'والناس': 1}
```
#### sort_dictionary_by_similarity
**sort_dictionary_by_similarity(frequency_dictionary,threshold=0.8)**
- using to **cluster words by using similarity** and sort every bunch of word by most common and sort bunches descending in the same time takes the frequency dictionary generated using [generate_frequency_dictionary](#generate_frequency_dictionary) function. This function takes dictionary of frequencies and **threshold (optional)** to specify **the degree of similarity**
```python
sortedDictionary = pq.sort_dictionary_by_similarity(dictionaryFrequency)
print(sortedDictionary)
>>> {'الناس': 4, 'الخناس': 1, 'والناس': 1, 'من': 2, 'قل': 1, 'أعوذ': 1, 'برب': 1, 'ملك': 1, 'إله': 1, 'شر': 1, 'الوسواس': 1, 'الذى': 1, 'يوسوس': 1, 'فى': 1, 'صدور': 1, 'الجنة': 1}
```
#### check_sura_with_frequency
**check_sura_with_frequency(sura_num,freq_dec)**
- function checks if frequency dictionary of **specific chapter** is compatible with **original chapter** in quran, it takes **sura_num** (chapter number) and **freq_dec** (frequency dictionary) and return **True** if compatible and **False** in not.
```python
dictionaryFrequency = pq.generate_frequency_dictionary(111)
matched = pq.check_sura_with_frequency(110,dictionaryFrequency)
print(matched)
>>> False
```
#### generate_latex_table
**generate_latex_table(dictionary,filename,location=".")**
- generates latex code of table of frequency it takes dictionary frequency ,it takes **dictionary** (frequency dictionary) , **filename** and **location** (location to save) , the default location is same directory by symbol '.', then it returns **True** if the operation of generation completed successfully **False** if something wrong
```python
latexTable = pq.generate_latex_table(dictionaryFrequency,'any_file_name')
print(latexTable)
>>> True
```
## Search functions
#### search_sequence
**search_sequence(sequancesList,verse=None,chapterNum=0,verseNum=0,mode=3)**
- take list of sequances and return matched sequance, it search in verse ot chapter or All Quran,
- it return for every match :
- matched sequance
- chapter number of occurrence
- token number if word and 0 if sentence
- Note :
- if found verse != None it will use it en search .
- if no verse and found chapterNum and verseNum it will use this verse and use it to search.
- if no verse and no verseNum and found chapterNum it will search in chapter.
- if no verse and no chapterNum and no verseNum it will search in All Quran.
- it has many modes:
1. search with decorated sequance (with tashkeel), and return matched sequance with decorates (with tashkil).
2. search without decorated sequance (without tashkeel), and return matched sequance without decorates (without tashkil).
3. search without decorated sequance (without tashkeel), and return matched sequance with decorates (with tashkil).
- optional opptions:
- **verse** (str): if passed, it will applied to this string only
- **chapterNum** (int) : if passed only, it will applied to this chapter only.
- **verseNum** (int) :
- if passed only, it will applied to **verseNum** for **all Chapters**.
- if passed with **chapterNum**, it will applied to verseNum for **chapterNum**.
- **with_tashkeel** (bool):
- if **True** applied to Quran **with** Tashkieel.
- if **False** applied to Quran **without** Tashkieel.
- mode (int): this mode that you need to use and default mode 3
- Note : if don't pass any **optional opptions** it will applied to all **Quran**.
- Returns: dict() : key is sequances and value is a list of matched_sequance and their positions
```python
matchedKeyword = pq.search_sequence(['قل أعوذ برب'])
print(matchedKeyword)
>>> {'قل أعوذ برب': [('قُلْ أَعُوذُ بِرَبِّ', 0, 1, 113), ('قُلْ أَعُوذُ بِرَبِّ', 0, 1, 114)]}
```
#### search_string_with_tashkeel
**search_string_with_tashkeel(sentence,tashkeel_pattern)**
- takes an **sentence** and **tashkeel_pattern** (composed of 0's , 1's) and it returns the locations that matched the pattern of diacrictics start index **inclusive** and end index **exculsive** and return empty list if not found.
```python
sentence = 'صِفْ ذَاْ ثَنَاْ كَمْ جَاْدَ شَخْصٌ'
tashkeel_pattern = ar.fatha + ar.sukun
results = pq.search_string_with_tashkeel(sentence,tashkeel_pattern)
print(results)
>>> [(3, 5), (7, 9), (10, 12), (13, 15), (17, 19)]
```
#### search_with_pattern
**search_with_pattern(pattern,sentence=None,verseNum=None,chapterNum=None,threshold=1)**
- this function use to search in 0's,1's pattern and return matched words from sentence pattern dependent on the threshold, it takes a **patter** that you need to looking for , and **sentence (optional)** (sentence where will search), **chapterNum (opetional)** and **verseNum (opetional)** and return list of matched words and sentences.
- Cases:
1. if pass sentece only or with another args
it will search in sentece only.
2. if not passed sentence and passed verseNum and chapterNum,
it will search in this verseNum that exist in chapterNum only.
3. if not passed sentence,verseNum and passed chapterNum only,
it will search in this specific chapter only
* Note : it's takes time dependent on your threshold and size of chapter, so it's not support to search on All-Quran becouse it take very long time more than 11 min.
```python
result = pq.search_with_pattern(pattern="01111",chapterNum=1,threshold=0.9)
print(result)
>>>['الرَّحِيمِ مَلِكِ', 'نَعْبُدُ وَإِيَّاكَ', 'الْمُسْتَقِيمَ صِرَطَ']
```
================================================
FILE: documentation/docs/quran_tools.md
================================================
## Importing PyQuran
Note that PyQuran is imported by a **lowercase name**.
```python
import pyquran as q
```
- Quran retrieving tools are in `q.quran`.
### get_sura
```python
get_sura(sura_number, with_tashkeel=False, basmalah=False)
```
returns a sura as a list of verses.
__Args__
- __sura_number__: 1 <= Integer <= 114, the ordered number of sura in Mushaf.
- __with_tashkeel__: Boolean, if true return sura with tashkeel else return
without.
- __basmalah__: Boolean, adding basmalah as aya.
__Returns__
- __[str]__: a list of sura's ayat.
__Note__
Index statrts at zero.
So if the order number of an aya is x, then it's at (x-1) in the returned
list.
__Example__
```python
q.quran.get_sura(108, with_tashkeel=True)
>>> ['إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ', 'فَصَلِّ لِرَبِّكَ وَانْحَرْ', 'إِنَّ شَانِئَكَ هُوَ الْأَبْتَرُ']
```
----
### get_verse
```python
get_verse(sura_number, verse_number, with_tashkeel=False)
```
get specific verse form specific chapter
__Args__
- __sura_number__: 1 <= Integer <= 114, the ordered number of sura in Mushaf.
- __verse_number__: Integer > 0, number of verse.
- __with_tashkeel__: Boolean, if true return sura with tashkeel else return
without.
__Returns__
- __str__: a verse.
__Example__
```python
q.quran.get_verse(sura_number=1, verse_number=2)
>>> 'الحمد لله رب العلمين'
```
----
### get_sura_number
```python
get_sura_number(sura_name)
```
__Args__
sura_name (str) : string represents the sura name.
__Returns__
- __int__: the sura number which name is sura_name.
__Note__
Do not forget that the index of the returned list starts at zero.
So if the order Sura number is x, then it's at (x-1) in the list.
__Example__
```python
q.quran.get_sura_number('الملك')
>>> 67
```
----
### get_sura_name
```python
get_sura_name(sura_number=None)
```
Returns the name of `sura_number`. If `sura_number=None` a list of all
sura's names is retunred.
__Args__
- __sura_number__: Optional, 1 <= Integer <= 114, the ordered number of sura in Mushaf.
__Returns__
- __str__: the sura name which number is sura_number.
- __[srt]__: list of all suras' names (if the sura_number parameter is None).
__Example__
```python
q.quran.get_sura_name(2)
>>> 'البقرة'
```
================================================
FILE: documentation/docs/quran_tools_template.md
================================================
## Importing PyQuran
```python
import pyquran as q
```
{{autogenerated}}
================================================
FILE: documentation/generate.sh
================================================
#!/bin/bash
# Overwite files_template.md > files.md
cat templates/analysis_tools_template.md > docs/analysis_tools.md
cat templates/arabic_tools_template.md > docs/arabic_tools.md
cat templates/quran_tools_template.md > docs/quran_tools.md
cat docs/index.md > ../README.md # For the repo; Readme
cat docs/index.md > ../../README.md # for the PyPI Readme
cat docs/CONTRIBUTING.md > ../CONTRIBUTING.md
# Generate docs
./auto_gen_docs.py
================================================
FILE: documentation/git-adding.sh
================================================
git add docs/*
git add ../CONTRIBUTING.md
git add ../README.md
================================================
FILE: documentation/mkdocs.yml
================================================
site_name: PyQuran
theme: readthedocs
docs_dir: docs
repo_url: https://github.com/hci-lab/pyquran-private
# Documentation Layout
pages:
- Home: 'index.md'
- Users Documentation:
# Generated: run ./generate to update those files; according to the docs
# changed inside the code
- Quran Retrieving tools: 'quran_tools.md'
- Arabic tools: 'arabic_tools.md'
- Analysis tools: 'analysis_tools.md'
- Maintainers:
- Getting Started: 'CONTRIBUTING.md'
- Package Strucutre: 'Wiki-Home.md'
- Quran Corpus: 'Filtering-Special-Recitation-Symbols.md'
- Code Conventios: 'code_conventions.md'
- Related Projects:
- Dictionary of Quran Words Frequency: 'dictFrec.md'
- Authors: 'authors.md'
================================================
FILE: documentation/sources/analysis_tools_template.md
================================================
{{autogenerated}}
================================================
FILE: documentation/sources/arabic_tools_template.md
================================================
## Alphabets
We use [PyArabic](https://pypi.python.org/pypi/PyArabic/0.6.2) constants which
represents letters, instead of writting Arabic in the code.
```python
hamza = u'\u0621'
alef_mad = u'\u0622'
alef_hamza_above = u'\u0623'
waw_hamza = u'\u0624'
alef_hamza_below = u'\u0625'
yeh_hamza = u'\u0626'
alef = u'\u0627'
beh = u'\u0628'
teh_marbuta = u'\u0629'
teh = u'\u062a'
theh = u'\u062b'
jeem = u'\u062c'
hah = u'\u062d'
khah = u'\u062e'
dal = u'\u062f'
thal = u'\u0630'
reh = u'\u0631'
zain = u'\u0632'
seen = u'\u0633'
sheen = u'\u0634'
sad = u'\u0635'
dad = u'\u0636'
tah = u'\u0637'
zah = u'\u0638'
ain = u'\u0639'
ghain = u'\u063a'
feh = u'\u0641'
qaf = u'\u0642'
kaf = u'\u0643'
lam = u'\u0644'
meem = u'\u0645'
noon = u'\u0646'
heh = u'\u0647'
waw = u'\u0648'
alef_maksura = u'\u0649'
yeh = u'\u064a'
madda_above = u'\u0653'
hamza_above = u'\u0654'
hamza_below = u'\u0655'
alef_wasl = u'\u0671'
```
## Alphabetical Systems (Definitions)
[**Rasm**](https://en.wikipedia.org/wiki/Rasm): is any set of letters which are
writtern in the same form, namely; they are indistinguishable in wirtting by
they are distinguished from the context. For example, the letters ت ث ن ى,
they can be written with only one rasm ىـ, without dots.
**Alphabetical System**: is a set of rasm; dynamically constructed by
specifying the letters that you will treat them as one rasm. By the way, the
default Arabic alphabet is a special case of the **Alphabetical System** where
each letter is as one rasm.
**Predefined systems** are stored in `systems` object.
1. **Default**: each letter is treated as a unique rasm.
2. **Without Dots**: by removing the dots some letters will be
indistinguishable; those letters are treated as one rasm.
The following example shows the (Without Dots) system as a list of lists;
where the sublist contains the letters which share the same rasm.
3. **Hamazat**: consider each any letter accompanied by hamaz ء as one rasm.
**NOTE**: You may go further and construct your system by speicying what
letters you want to treat as one rasm, then you can do some statistical
analysis like, count, variance, average, ...
Example:
```python
q.systems.withoutDots
Out:
[['ب', 'ت', 'ث', 'ن'], # Rasm 1
['ح', 'خ', 'ج'], # Rasm 2
['د', 'ذ'], # Rasm 3
['ر', 'ز'], # Rasm 4
['س', 'ش'], # Rasm 5
['ص', 'ض'], # Rasm 6
['ط', 'ظ'], # Rasm 7
['ع', 'غ'], # Rasm 8
['ف', 'ق']] # Rasm 9
```
### Constructing a user-defined system:
```python
system = [[alef_hamza_above, alef],
[beh, teh]]
```
The previous piece of code means "Treat *alef_hamza_above* and *alef*
as the same one latter, also treat *beh* and *teh* as one letter as well".
The rest of letters can be dynamically constructed using `check_system()`
And then, a system can be applied to some text analysis functions like counting,
filtering, etc.
{{autogenerated}}
================================================
FILE: documentation/sources/quran_tools_template.md
================================================
## Importing PyQuran
```python
import PyQuran as q
```
- Quran retrieving tools are in `q.quran`.
{{autogenerated}}
================================================
FILE: documentation/templates/analysis_tools_template.md
================================================
{{autogenerated}}
================================================
FILE: documentation/templates/arabic_tools_template.md
================================================
## Alphabets
We use [PyArabic](https://pypi.python.org/pypi/PyArabic/0.6.2) constants which
represents letters, instead of writting Arabic in the code.
```python
hamza = u'\u0621'
alef_mad = u'\u0622'
alef_hamza_above = u'\u0623'
waw_hamza = u'\u0624'
alef_hamza_below = u'\u0625'
yeh_hamza = u'\u0626'
alef = u'\u0627'
beh = u'\u0628'
teh_marbuta = u'\u0629'
teh = u'\u062a'
theh = u'\u062b'
jeem = u'\u062c'
hah = u'\u062d'
khah = u'\u062e'
dal = u'\u062f'
thal = u'\u0630'
reh = u'\u0631'
zain = u'\u0632'
seen = u'\u0633'
sheen = u'\u0634'
sad = u'\u0635'
dad = u'\u0636'
tah = u'\u0637'
zah = u'\u0638'
ain = u'\u0639'
ghain = u'\u063a'
feh = u'\u0641'
qaf = u'\u0642'
kaf = u'\u0643'
lam = u'\u0644'
meem = u'\u0645'
noon = u'\u0646'
heh = u'\u0647'
waw = u'\u0648'
alef_maksura = u'\u0649'
yeh = u'\u064a'
madda_above = u'\u0653'
hamza_above = u'\u0654'
hamza_below = u'\u0655'
alef_wasl = u'\u0671'
```
## Alphabetical Systems (Definitions)
[**Rasm**](https://en.wikipedia.org/wiki/Rasm): is any set of letters which are
writtern in the same form, namely; they are indistinguishable in wirtting by
they are distinguished from the context. For example, the letters ت ث ن ى,
they can be written with only one rasm ىـ, without dots.
**Alphabetical System**: is a set of rasm; dynamically constructed by
specifying the letters that you will treat them as one rasm. By the way, the
default Arabic alphabet is a special case of the **Alphabetical System** where
each letter is as one rasm.
**Predefined systems** are stored in `systems` object.
1. **Default**: each letter is treated as a unique rasm.
2. **Without Dots**: by removing the dots some letters will be
indistinguishable; those letters are treated as one rasm.
The following example shows the (Without Dots) system as a list of lists;
where the sublist contains the letters which share the same rasm.
3. **Hamazat**: consider each any letter accompanied by hamaz ء as one rasm.
**NOTE**: You may go further and construct your system by speicying what
letters you want to treat as one rasm, then you can do some statistical
analysis like, count, variance, average, ...
Example:
```python
q.systems.withoutDots
Out:
[['ب', 'ت', 'ث', 'ن'], # Rasm 1
['ح', 'خ', 'ج'], # Rasm 2
['د', 'ذ'], # Rasm 3
['ر', 'ز'], # Rasm 4
['س', 'ش'], # Rasm 5
['ص', 'ض'], # Rasm 6
['ط', 'ظ'], # Rasm 7
['ع', 'غ'], # Rasm 8
['ف', 'ق']] # Rasm 9
```
### Constructing a user-defined system:
```python
system = [[alef_hamza_above, alef],
[beh, teh]]
```
The previous piece of code means "Treat *alef_hamza_above* and *alef*
as the same one latter, also treat *beh* and *teh* as one letter as well".
The rest of letters can be dynamically constructed using `check_system()`
And then, a system can be applied to some text analysis functions like counting,
filtering, etc.
{{autogenerated}}
================================================
FILE: documentation/templates/quran_tools_template.md
================================================
## Importing PyQuran
```python
import PyQuran as q
```
- Quran retrieving tools are in `q.quran`.
{{autogenerated}}
================================================
FILE: testing/run_test.sh
================================================
#!/bin/bash
# a shell script to test `PyQuran` comprehensively
#
# Usage:
# $ ./run_test.sh
#
# ToDo:
# * Array of file names
# * loop to run them
# * add commend line arguments to test a single module.
python3 -B test_quran.py
python3 -B test_searchHelper.py
python3 -B test_pyquran.py
================================================
FILE: testing/test_pyquran.py
================================================
"""unittest module for pyquran.py
"""
import unittest
import numpy as np
# Adding another searching path
from sys import path
import os
# The current path of the current module.
path_current_module = os.path.dirname(os.path.abspath(__file__))
tools_modules = '../tools/'
core_modules = '../core/'
tools_path = os.path.join(path_current_module, tools_modules)
core_path = os.path.join(path_current_module, core_modules)
path.append(tools_path)
path.append(core_path)
from arabic import *
import quran
import pyquran
class Testing_pyquran(unittest.TestCase):
def test_search_string_with_tashkeel(self):
sentence = 'ﺺِﻓْ ﺫَﺍْ ﺚَﻧَﺍْ ﻚَﻣْ ﺝَﺍْﺩَ ﺶَﺨْﺻٌ'
x = pyquran.search_string_with_tashkeel(sentence, fatha + sukun)
y = [(3, 5), (7, 9), (10, 12), (13, 15), (17, 19)]
self.assertEqual(x, y)
def test_get_tashkeel_binary(self):
binaryPatternY = '0010101'
subAyah = 'الْأَحْيَاءُ'
binaryPatternX = pyquran.get_tashkeel_binary(subAyah)[0]
self.assertEqual(binaryPatternX,binaryPatternY)
binaryPatternY = '1010 101011 001011'
subAyah = 'إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ'
binaryPatternX = pyquran.get_tashkeel_binary(subAyah)[0]
self.assertEqual(binaryPatternX,binaryPatternY)
binaryPatternY = '101 00011 0001011 0001101'
subAyah = 'بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمِ'
binaryPatternX = pyquran.get_tashkeel_binary(subAyah)[0]
self.assertEqual(binaryPatternX,binaryPatternY)
binaryPatternY = '11011 1011 10 10 00011101 110 10 00101 00111 0010101 001101 001101'
subAyah = ' يُسَبِّحُ لِلَّهِ مَا فِي السَّمَوَاتِ وَمَا فِي الْأَرْضِ الْمَلِكِ الْقُدُّوسِ الْعَزِيزِ الْحَكِيمِ'
binaryPatternX = pyquran.get_tashkeel_binary(subAyah)[0]
self.assertEqual(binaryPatternX,binaryPatternY)
def test_get_frequency(self):
ver_w_taskeel = quran.get_verse(1,1,with_tashkeel=True)
fre_dec = {'الرَّحِيمِ': 1, 'الرَّحْمَنِ': 1, 'اللَّهِ': 1, 'بِسْمِ': 1}
self.assertEqual(pyquran.get_frequency(ver_w_taskeel),fre_dec)
fre_dec={'أُنزِلَ': 2,
'إِلَيْكَ': 1,
'بِمَا': 1,
'قَبْلِكَ': 1,
'مِن': 1,
'هُمْ': 1,
'وَالَّذِينَ': 1,
'وَبِالْءَاخِرَةِ': 1,
'وَمَا': 1,
'يُؤْمِنُونَ': 1,
'يُوقِنُونَ': 1}
freq = pyquran.get_frequency(quran.get_verse(2,4,with_tashkeel=True))
self.assertEqual(freq,fre_dec)
def test_generate_frequency_dictionary(self):
fre_dec = {'أحد': 2,
'الصمد': 1,
'الله': 2,
'قل': 1,
'كفوا': 1,
'لم': 1,
'له': 1,
'هو': 1,
'ولم': 2,
'يكن': 1,
'يلد': 1,
'يولد': 1}
sura = pyquran.generate_frequency_dictionary(suraNumber=112)
self.assertEqual(sura,fre_dec)
def test_check_sura_with_frequency(self):
freq = pyquran.generate_frequency_dictionary(suraNumber=2)
self.assertEqual(pyquran.check_sura_with_frequency(2,freq),True)
freq = pyquran.generate_frequency_dictionary(suraNumber=95)
self.assertEqual(pyquran.check_sura_with_frequency(95,freq),True)
def test_sort_dictionary_by_similarity(self):
freq = pyquran.generate_frequency_dictionary(suraNumber=113)
fre_dec = {'أعوذ': 1,
'إذا': 2,
'العقد': 1,
'الفلق': 1,
'النفثت': 1,
'برب': 1,
'حاسد': 1,
'حسد': 1,
'خلق': 1,
'شر': 4,
'غاسق': 1,
'فى': 1,
'قل': 1,
'ما': 1,
'من': 1,
'وقب': 1,
'ومن': 3}
self.assertEqual(pyquran.sort_dictionary_by_similarity(freq),fre_dec)
freq = pyquran.generate_frequency_dictionary(suraNumber=112)
fre_dec={'الله': 2, 'ولم': 2, 'قل': 1, 'هو': 1, 'الصمد': 1, 'لم': 1, 'يلد': 1, 'يولد': 1, 'له': 1, 'كفوا': 1, 'أحد': 2, 'يكن': 1}
self.assertEqual(pyquran.sort_dictionary_by_similarity(freq,threshold=0.2),fre_dec)
fre_dec={'ولم': 2, 'الصمد': 1, 'لم': 1, 'يولد': 1, 'الله': 2, 'له': 1, 'أحد': 2, 'قل': 1, 'هو': 1, 'يلد': 1, 'يكن': 1, 'كفوا': 1}
self.assertEqual(pyquran.sort_dictionary_by_similarity(freq,threshold=0.45),fre_dec)
def test_frequency_of_character(self):
ver_w_taskeel = quran.get_verse(1,1,with_tashkeel=True)
self.assertEqual(pyquran.frequency_of_character(['ا','ض',"بً"],with_tashkeel=False),{'ا': 38667, 'ض': 1686, 'بً': 0})
self.assertEqual(pyquran.frequency_of_character(['ا','ض',"بً"],with_tashkeel=True),{'ا': 38667, 'ض': 1686, 'بً': 218})
self.assertEqual(pyquran.frequency_of_character(['ا','ض',"بً"],verseNum=1,with_tashkeel=True),{'ا': 426, 'ض': 18, 'بً': 2})
self.assertEqual(pyquran.frequency_of_character(['ا','ض',"بً"],verseNum=4,chapterNum=12,with_tashkeel=True),{'ا': 4, 'ض': 0, 'بً': 1})
self.assertEqual(pyquran.frequency_of_character(['ا','ض',"بً"],verse=ver_w_taskeel),{'ا': 3, 'ض': 0, 'بً': 0})
def test_get_token(self):
self.assertEqual(pyquran.get_token(4,1,1),'الرحيم')
self.assertEqual(pyquran.get_token(5,1,1),'')
self.assertEqual(pyquran.get_token(20,0,5),'')
with self.assertRaises(ValueError):
pyquran.get_token(20,0,-5)
self.assertEqual(pyquran.get_token(95,1,5),'')
self.assertEqual(pyquran.get_token(4,1,1,with_tashkeel=True),'الرَّحِيمِ')
def test_search_sequence(self):
result=pyquran.search_sequence(['بِسْمِ اللَّهِ','الرحمن'],verseNum=1,chapterNum=1)
real={'الرحمن': [('الرَّحْمَنِ', 3, 1, 1)],
'بسم الله': [('بِسْمِ اللَّهِ', 0, 1, 1)]}
self.assertEqual(result,real)
result=pyquran.search_sequence(['بِسْمِ اللَّهِ','الرحمن'],verseNum=1,chapterNum=1,mode=1)
real={'الرحمن': [], 'بِسْمِ اللَّهِ': [('بِسْمِ اللَّهِ', 0, 1, 1)]}
self.assertEqual(result,real)
def test_search_with_pattern(self):
result = pyquran.search_with_pattern(pattern="01101011000101",chapterNum=2)
real=['ءَامِنُوا كَمَا ءَامَنَ النَّاسُ', 'وَلَتَجِدَنَّهُمْ أَحْرَصَ النَّاسِ', 'بِالْمَعْرُوفِ حَقًّا عَلَى الْمُتَّقِينَ', 'بِالْمَعْرُوفِ حَقًّا عَلَى الْمُحْسِنِينَ', 'لِلتَّقْوَى وَلَا تَنسَوُا الْفَضْلَ']
self.assertEqual(result,real)
result=pyquran.search_with_pattern(pattern="0110101100111010101",chapterNum=2)
self.assertEqual(result,[])
result = pyquran.search_with_pattern(pattern="01111",chapterNum=1)
real = ['الرَّحِيمِ مَلِكِ', 'نَعْبُدُ وَإِيَّاكَ', 'الْمُسْتَقِيمَ صِرَطَ']
self.assertEqual(result,real)
try:
pyquran.search_with_pattern(pattern="01111")
result=True
except:
result=False
self.assertEqual(result,False)
result=pyquran.search_with_pattern(pattern="01111",chapterNum=1,threshold=0.9)
real=['الرَّحِيمِ مَلِكِ', 'نَعْبُدُ وَإِيَّاكَ', 'الْمُسْتَقِيمَ صِرَطَ']
self.assertEqual(result,real)
def test_count_rasm(self):
# test case 1: small surah with system
system = [[beh, teh, theh],
[jeem, hah, khah]]
returnedNParray = pyquran.count_rasm(quran.get_sura(110), system)
expectedFROW = [1, 0, 0, 0, 0, 1, 0, 4, 1, 0, 2, 0, 1, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 3, 0, 1, 1, 1, 0, 0]
self.assertEqual(returnedNParray.shape, (3, 33))
self.assertEqual(list(returnedNParray[0]), expectedFROW)
# Shuffle a subsystem "same result expected"
system = [[theh, beh, teh],
[jeem, hah, khah]]
returnedNParray = pyquran.count_rasm(quran.get_sura(110), system)
self.assertEqual(returnedNParray.shape, (3, 33))
self.assertEqual(list(returnedNParray[0]), expectedFROW)
#Shuffle system "same result expected"
system = [[jeem, hah, khah],
[theh, beh, teh]]
returnedNParray = pyquran.count_rasm(quran.get_sura(110), system)
self.assertEqual(returnedNParray.shape, (3, 33))
self.assertEqual(list(returnedNParray[0]), expectedFROW)
system = [[hah, jeem, khah],
[theh, teh, beh]]
returnedNParray = pyquran.count_rasm(quran.get_sura(110), system)
self.assertEqual(returnedNParray.shape, (3, 33))
self.assertEqual(list(returnedNParray[0]), expectedFROW)
#build a very strange system :"D
system = [[jeem, alef_hamza_above, waw, ghain],
[meem, sheen, teh_marbuta, zah],
[lam, alef_maksura, dal]]
returnedNParray = pyquran.count_rasm(quran.get_sura(110), system)
expectedFROW = [1, 0, 0, 2, 0, 1, 0, 4, 0, 0, 1, 0, 1,
0, 3, 1, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 1, 1, 0]
self.assertEqual(returnedNParray.shape, (3, 29))
self.assertEqual(list(returnedNParray[0]), expectedFROW)
# test case 2: big surah with system
system = [[beh, teh, theh], [jeem, hah, khah]]
returnedNParray = pyquran.count_rasm(quran.get_sura(2), system)
self.assertEqual(returnedNParray.shape, (286, 33))
# test case 3: without system
returnedNParray = pyquran.count_rasm(quran.get_sura(110))
expectedFROW = [1, 0, 0, 0, 0, 1, 0, 4, 0, 0, 1, 0, 1, 1,
0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 3, 0, 1, 1, 1, 0, 0]
self.assertEqual(returnedNParray.shape, (3, 37))
self.assertEqual(list(returnedNParray[0]), expectedFROW)
# Test case 4: repeat a char in two subsystems
system = [[beh, teh, theh], [jeem, hah, khah, beh]]
self.assertRaises(ValueError, pyquran.count_rasm, quran.get_sura(110), system)
# Test case 5: path a system (as a list not list of lists)
self.assertRaises(ValueError, pyquran.count_rasm, quran.get_sura(110), [beh, teh, theh])
self.assertRaises(ValueError, pyquran.count_rasm, quran.get_sura(110), [[beh, teh, theh], hah])
def test_check_system(self):
system = [[beh, teh, theh], [jeem, hah, khah]]
actualList = pyquran.check_system(system)
self.assertEqual(len(actualList), 33)
indx = list(alphabet).index(beh)
self.assertEqual(actualList[indx], [beh, teh, theh])
indx = list(alphabet).index(jeem)
# subtract 2 because teh and theh count as beh(all of them equal 7)
self.assertEqual(actualList[indx-2], [jeem, hah, khah])
def test_buckwalter_transliteration(self):
# test case 1:"from arabic without tashkeel to buckwalter "
self.assertEqual(pyquran.buckwalter_transliteration("مرحبا"), "mrHbA")
# test case 2:"from arabic with tashkeel to buckwalter "
arabicText = "يُولَدُ جَمِيعُ ٱلنّاسِ أَحْرَارًا مُتَسَاوِينَ فِي ٱلْكَرَامَةِ وَٱلْحُقُوقِ. وَقَدْ وُهِبُوا عَقْلًا وَضَمِيرًا وَعَلَيْهِمْ أَنْ يُعَامِلَ بَعْضُهُمْ بَعْضًا بِرُوحِ ٱلْإِخَاءِ"
expectedTransliteration = "yuwladu jamiyEu {ln~Asi >aHoraArFA mutasaAwiyna fiy {lokaraAmapi wa{loHuquwqi. waqado wuhibuwA EaqolFA waDamiyrFA waEalayohimo >ano yuEaAmila baEoDuhumo baEoDFA biruwHi {lo alphabet without taskell then alphabet with fatha, ...
'''
alphabet = [] + arabic.alphabet
alphabet += ' '
arabic_alphabet_tashkeel = [''] + alphabet + arabic_alphabet_tashkeel
return arabic_alphabet_tashkeel
def one_hot(string, padding_length=0):
'''
* Optimized for memory use.
* encodes each letter in string with ont-hot vector
* returns a list of one-hot vectors a list of (1*182) vectors
* letter -> 1*182 vector
'''
cleanedString = factor_shadda_tanwin(string)
charCleanedString = separate_token_with_dicrites(cleanedString)
# Initializing a Matrix
encodedString = np.zeros( (padding_length, len(lettersTashkeelCombination)) )
letter = 0
for char in charCleanedString:
one_index = lettersTashkeelCombination.index(char)
# * add 1 for the current letter in one_index
encodedString[letter][one_index] = 1
letter +=1
return encodedString
================================================
FILE: tools/__init__.py
================================================
# Adding another searching path
from sys import path
import os
# The current path of the current module.
path_current_module = os.path.dirname(os.path.abspath(__file__))
path.append(path_current_module)
================================================
FILE: tools/arabic.py
================================================
"""This module contains Arabic tools for text analysis
"""
# Umar; remove this to quran and correct the spelling to `suar_num`
swar_num = 114
# letters.
hamza = u'\u0621'
hamza_above = u'\u0654' #
alef_mad = u'\u0622'
alef_hamza_above = u'\u0623'
waw_hamza = u'\u0624'
alef_hamza_below = u'\u0625'
yeh_hamza = u'\u0626'
alef = u'\u0627'
beh = u'\u0628'
teh_marbuta = u'\u0629'
teh = u'\u062a'
theh = u'\u062b'
jeem = u'\u062c'
hah = u'\u062d'
khah = u'\u062e'
dal = u'\u062f'
thal = u'\u0630'
reh = u'\u0631'
zain = u'\u0632'
seen = u'\u0633'
sheen = u'\u0634'
sad = u'\u0635'
dad = u'\u0636'
tah = u'\u0637'
zah = u'\u0638'
ain = u'\u0639'
ghain = u'\u063a'
feh = u'\u0641'
qaf = u'\u0642'
kaf = u'\u0643'
lam = u'\u0644'
meem = u'\u0645'
noon = u'\u0646'
heh = u'\u0647'
waw = u'\u0648'
alef_maksura = u'\u0649'
yeh = u'\u064a'
madda_above = u'\u0653'
hamza_above = u'\u0654'
hamza_below = u'\u0655'
alef_wasl = u'\u0671'
tatweel = u'\u0640'
# diacritics
fathatan = u'\u064b'
dammatan = u'\u064c'
kasratan = u'\u064d'
fatha = u'\u064e'
damma = u'\u064f'
kasra = u'\u0650'
shadda = u'\u0651'
sukun = u'\u0652'
# small letters
small_alef = u"\u0670"
small_waw = u"\u06e5"
small_yeh = u"\u06e6"
#ligatures
lam_alef = u'\ufefb'
lam_alef_hamza_above = u'\ufef7'
lam_alef_hamza_below = u'\ufef9'
lam_alef_mad_above = u'\ufef5'
simple_lam_alef = u'\u0644\u0627'
simple_lam_alef_hamza_above = u'\u0644\u0623'
simple_lam_alef_hamza_below = u'\u0644\u0625'
simple_lam_alef_mad_above = u'\u0644\u0622'
# Lists
alphabet = [
hamza,
hamza_above,
alef_mad,
alef_hamza_above,
waw_hamza,
alef_hamza_below,
yeh_hamza,
alef,
beh,
teh_marbuta,
teh,
theh,
jeem,
hah,
khah,
dal,
thal,
reh,
zain,
seen,
sheen,
sad,
dad,
tah,
zah,
ain,
ghain,
feh,
qaf,
kaf,
lam,
meem,
noon,
heh,
waw,
alef_maksura,
yeh
]
tashkeel = [fathatan, dammatan, kasratan, fatha, damma, kasra, sukun, shadda]
harakat = [fathatan, dammatan, kasratan, fatha, damma, kasra, sukun]
shortharakat = [ fatha, damma, kasra, sukun]
shortharakatWithShadda = [ fatha, damma, kasra, sukun, shadda]
tanwin = [fathatan, dammatan, kasratan]
not_def_haraka = tatweel
lamAlefLike = [
lam_alef,
lam_alef_hamza_above,
lam_alef_hamza_below,
lam_alef_mad_above,
]
hamzat = [
hamza,
waw_hamza,
yeh_hamza,
hamza_above,
alef_hamza_below,
alef_hamza_above,
alef_mad
]
alefat = [
alef,
alef_mad,
alef_hamza_above,
alef_hamza_below,
alef_wasl,
alef_maksura,
small_alef,
]
# wihtout dots. Groups
behLike = [beh, teh, theh, noon]
jeemLike = [hah, khah, jeem]
dalLike = [dal, thal]
rehLike = [reh, zain]
seenLike = [seen, sheen]
sadLike = [sad, dad]
tahLike = [tah, zah]
ainLike = [ain, ghain]
fehLike = [feh, qaf]
weak = [ alef, waw, yeh, alef_maksura]
yehlike = [ yeh, yeh_hamza, alef_maksura, small_yeh ]
wawLike = [ waw, waw_hamza, small_waw ]
tehLike = [ teh, teh_marbuta ]
small = [ small_alef, small_waw, small_yeh]
moon_letters = [hamza ,
alef_mad ,
alef_hamza_above ,
alef_hamza_below ,
alef ,
beh ,
jeem ,
hah ,
khah ,
ain ,
ghain ,
feh ,
qaf ,
kaf ,
meem ,
heh ,
waw ,
yeh
]
sun_letters = [
teh ,
theh ,
dal ,
thal ,
reh ,
zain ,
seen ,
sheen ,
sad ,
dad ,
tah ,
zah ,
lam ,
noon ,
]
# Systems
class Systems:
'''A container of systems.
'''
def __init__(self):
#
self.withoutDots = [behLike,
jeemLike,
dalLike,
rehLike,
seenLike,
sadLike,
tahLike,
ainLike,
fehLike]
#
self.hamazat = [hamzat]
#
self.default = alphabet
# END CLASS
# Exporting object
systems = Systems()
"""
* Some alphabet building tools
"""
def alphabet_excluding(excludedLetters):
"""returns the alphabet excluding `excludedLetters`.
Args:
excludedLetters: list[Char], letters to be excluded from the alphabet.
Returns:
str: alphabet excluding `excludedLetters`.
Example:
```python
q.alphabet_excluding([q.alef, q.beh, q.qaf, q.teh, q.dal, q.yeh, q.alef_mad])
>>>
['ء',
'ٔ',
'أ',
'ؤ',
'إ',
'ئ',
'ة',
'ث',
'ج',
'ح',
'خ',
'ذ',
'ر',
'ز',
'س',
'ش',
'ص',
'ض',
'ط',
'ظ',
'ع',
'غ',
'ف',
'ك',
'ل',
'م',
'ن',
'ه',
'و',
'ى']
```
"""
return [x for x in alphabet if x not in excludedLetters]
def treat_as_the_same(listOfLetter, letter, text):
"""convert any letter in the `listOfLetter` to `letter` in the given text
Args:
listOfLetter (['chars'] or str)
letter (char)
text (str)
Returns:
str: a text after changing all the `listOfLetter` to that char `letter`
Example:
print(treat_as_the_same([alef_hamza_above], alef, line))
print(treat_as_the_same([ain], qaf, line))
"""
pass
def strip_tashkeel(string):
"""convert any letter in the `listOfLetter` to `letter` in the given text
Args:
string: str, to drop tashkeel from.
Example:
```python
x = q.quran.get_verse(12, 2, with_tashkeel=True)
x
>>> 'إِنَّا أَنزَلْنَهُ قُرْءَنًا عَرَبِيًّا لَّعَلَّكُمْ تَعْقِلُونَ'
q.strip_tashkeel(x)
>>> 'إنا أنزلنه قرءنا عربيا لعلكم تعقلون'
```
"""
for char in string:
if char in tashkeel:
string = string.replace(char, '')
return string
def factor_shadda_tanwin(string):
'''
* factors shadda to letter with sukun and letter
* factors tanwin to ?????????
# Some redundancy is simpler. :"D
'''
factoredString = ''
charsList = separate_token_with_dicrites(string)
# print(charsList)
for char in charsList:
if len(char) < 2:
factoredString += char
if len(char) == 2:
if char[1] in arabic.shortharakat:
factoredString += char
elif char[1] == arabic.dammatan:
if char[0] == arabic.teh_marbuta:
factoredString += arabic.teh + arabic.damma + \
arabic.noon + arabic.sukun
else:
# the letter
factoredString += char[0] + arabic.damma + \
arabic.noon + arabic.sukun
elif char[1] == arabic.kasratan:
if char[0] == arabic.teh_marbuta:
factoredString += char[0] + arabic.teh + \
arabic.kasra + arabic.noon + arabic.sukun
else:
# the letter
factoredString += char[0] + arabic.kasra \
+ arabic.noon + arabic.sukun
elif char[1] == arabic.fathatan:
if char[0] == arabic.alef:
factoredString += arabic.noon + arabic.sukun
elif char[0] == arabic.teh_marbuta:
factoredString += arabic.teh + arabic.fatha \
+ arabic.noon + arabic.sukun
elif char[1] == arabic.shadda:
factoredString += char[0] + arabic.sukun + char[0]
if len(char) == 3:
factoredString += char[0] + arabic.sukun + char[0] + char[2]
return factoredString
'''
print(factor_shadda_tanwin('بيتٌ'))
print(factor_shadda_tanwin('ولدٍ'))
print(factor_shadda_tanwin('ولدَاً'))
print(factor_shadda_tanwin('مدرسةً'))
print(factor_shadda_tanwin('مدرسةٍ'))
print(factor_shadda_tanwin('مدرسةٌ'))
print(factor_shadda_tanwin('شبّ'))
print(factor_shadda_tanwin('كبَّ'))
'''
'''
# Testing
for i in factor_shadda_tanwin('أَشَّدونٌ'):
print(i)
'''
================================================
FILE: tools/buckwalter.py
================================================
'''
Declare a dictionary with Buckwalter's ASCII symbols as the keys, and
their unicode equivalents as values.
'''
buck2uni = {
"'": u"\u0621", # hamza-on-the-line
"|": u"\u0622", # madda
">": u"\u0623", # hamza-on-'alif
"&": u"\u0624", # hamza-on-waaw
"<": u"\u0625", # hamza-under-'alif
"}": u"\u0626", # hamza-on-yaa'
"A": u"\u0627", # bare 'alif
"b": u"\u0628", # baa'
"p": u"\u0629", # taa' marbuuTa
"t": u"\u062A", # taa'
"v": u"\u062B", # thaa'
"j": u"\u062C", # jiim
"H": u"\u062D", # Haa'
"x": u"\u062E", # khaa'
"d": u"\u062F", # daal
"*": u"\u0630", # dhaal
"r": u"\u0631", # raa'
"z": u"\u0632", # zaay
"s": u"\u0633", # siin
"$": u"\u0634", # shiin
"S": u"\u0635", # Saad
"D": u"\u0636", # Daad
"T": u"\u0637", # Taa'
"Z": u"\u0638", # Zaa' (DHaa')
"E": u"\u0639", # cayn
"g": u"\u063A", # ghayn
"_": u"\u0640", # taTwiil
"f": u"\u0641", # faa'
"q": u"\u0642", # qaaf
"k": u"\u0643", # kaaf
"l": u"\u0644", # laam
"m": u"\u0645", # miim
"n": u"\u0646", # nuun
"h": u"\u0647", # haa'
"w": u"\u0648", # waaw
"Y": u"\u0649", # 'alif maqSuura
"y": u"\u064A", # yaa'
"F": u"\u064B", # fatHatayn
"N": u"\u064C", # Dammatayn
"K": u"\u064D", # kasratayn
"a": u"\u064E", # fatHa
"u": u"\u064F", # Damma
"i": u"\u0650", # kasra
"~": u"\u0651", # shaddah
"o": u"\u0652", # sukuun
"`": u"\u0670", # dagger 'alif
"{": u"\u0671", # waSla
}
================================================
FILE: tools/error.py
================================================
"""standard error module
"""
def is_int(number, message):
if type(number) is not int:
raise ValueError(message)
def is_bool(boolean, message):
if type(boolean) is not bool:
raise ValueError(message)
def is_string(string, message):
if type(string) is not str:
raise ValueError(message)
================================================
FILE: tools/filtering.py
================================================
'''Contains Uthmanic symbols and related functions.
reference: en.wikipedia.org/wiki/Arabic_script_in_Unicode
'''
import arabic
import error
import re
hamza_above = '\u0654' # u'\u0654'
small_high_meem = '\u06e2'
small_low_meem = '\u06ed'
small_high_seen = '\u06dc'
small_low_seen = '\u06e3'
small_alef = '\u0670'
small_waw = '\u06e5'
small_yeh = '\u06e6'
small_high_noon = '\u06e8'
mad_lazim_mark = '\u0653'
tatweel = '\u0640'
alef_wasl_with_saad_above = '\u0671'
empty_centre_high_stop = '\u06eb'
small_high_rounded_zero = '\u06df'
empty_center_low_stop = '\u06ea'
small_high_upright_rectangular_zero = '\u06e0'
rounded_high_stop_with_filled_centre = '\u06ec'
recitationSymbols = [
alef_wasl_with_saad_above, # Replace with alef
hamza_above, # Remain
small_high_meem, # Remove
small_low_meem, # Remove
small_high_seen, # Remove
small_low_seen, # Remove
small_alef, # Remove
small_waw, # Remove
small_yeh, # Remove
small_high_noon, # Remove
mad_lazim_mark, # Remove
tatweel, # Remove
empty_centre_high_stop, # Remove
small_high_rounded_zero, # Remove
empty_center_low_stop, # Remove
small_high_upright_rectangular_zero, # Remove
rounded_high_stop_with_filled_centre # Remove
]
'my_user_name'
'''
# Cannot fide hamza_above
import tools
import arabic
x = tools.search_sequence([hamza_above])
print(x)
quran = open('QuranCorpus/quran-uthmani.txt', 'r')
quran = quran.read()
#print(quran)
print(len(quran))
print(hamza_above in quran)
import re
p = re.compile(quran)
print(p.search(hamza_above))
print(p.findall(hamza_above))
'''
"""
problems;
* 'ء' is removed from AlNsaa 92 u'\u0621'
* hamza_above = '\u0654' # u'\u0654'
* 1:126 الأخر what is this hamza?! is it أ or alef + hamza above?
In [1]: u'\u0621'
Out[1]: 'ء'
In [2]: '\u0654'
Out[2]: 'ٔ'
"""
def get_patterns():
patterns = []
for x in [small_yeh, small_waw] :
for y in arabic.shortharakat:
patterns.append(x + y)
return patterns + [small_yeh, small_waw]
patterns_list = get_patterns()
remove_no_tashkeel_after = [
small_high_meem, # Remove
small_low_meem, # Remove
small_high_seen, # Remove
small_low_seen, # Remove
small_alef, # Remove
small_high_noon, # Remove
mad_lazim_mark, # Remove
tatweel, # Remove
empty_centre_high_stop, # Remove
small_high_rounded_zero, # Remove
empty_center_low_stop, # Remove
small_high_upright_rectangular_zero, # Remove
rounded_high_stop_with_filled_centre # Remove
]
def recitation_symbols_filter(string, symbols=recitationSymbols):
'''Removes the Special Recitation Symbols from `string`
Args:
param1(str): a string to be filtered
param2([char]: a list of recitation symbols
Issues:
* Some small litters have diacritics when they are removed
their diacritics remains.
* pyarabic strip_tashkeel -> revise it.
'''
error.is_string(string, 'You must pass an string')
for symbol in symbols:
if symbol == alef_wasl_with_saad_above:
string = string.replace(alef_wasl_with_saad_above, arabic.alef)
# Do not remove hamza_above
elif symbol == hamza_above:
continue
elif symbol in remove_no_tashkeel_after:
string = string.replace(symbol, '')
else:
for pat in patterns_list:
string = re.sub( pat , '', string)
return string
'''
for x in recitationSymbols :
print("> " + x + '\n')
'''
================================================
FILE: tools/quran.py
================================================
"""This modules contains functions to retrieve from quran.
"""
from xml.etree import ElementTree
import arabic as ar
import filtering
import error
import os
# Relative path to this modul's location in PyQuran.
corpus_xml_relative_path= '../QuranCorpus/quran-uthmani.xml'
# The current path of the current module.
current_path = os.path.dirname(os.path.abspath(__file__))
# Joining this module's path with the relative path of the corpus
corpus_path = os.path.join(current_path, corpus_xml_relative_path)
# Parsing xml
quran_tree = ElementTree.parse(corpus_path)
def get_sura(sura_number, with_tashkeel=False, basmalah=False):
"""returns a sura as a list of verses.
Args:
sura_number: 1 <= Integer <= 114, the ordered number of sura in Mushaf.
with_tashkeel: Boolean, if true return sura with tashkeel else return
without.
basmalah: Boolean, adding basmalah as aya.
Returns:
[str]: a list of sura's ayat.
Note:
Index statrts at zero.
So if the order number of an aya is x, then it's at (x-1) in the returned
list.
Example:
```python
q.quran.get_sura(108, with_tashkeel=True)\n
>>> ['إِنَّا أَعْطَيْنَكَ الْكَوْثَرَ', 'فَصَلِّ لِرَبِّكَ وَانْحَرْ', 'إِنَّ شَانِئَكَ هُوَ الْأَبْتَرُ']
```
"""
message = "Sura number must be an integer between 1 to 114, inclusive."
error.is_int(sura_number, message)
message = "The second parameter must be bool, it an optional False by default"
error.is_bool(with_tashkeel, message)
sura_number -= 1
sura = []
suras_list = quran_tree.findall('sura')
ayat = suras_list[sura_number]
for aya in ayat:
sura.append(aya.attrib['text'])
if basmalah and sura_number != 1 -1 and sura_number != 9 -1:
#suras_list[0][0].attrib['text']
bismilah = [suras_list[0][0].attrib['text']]
sura = bismilah + sura
uthmanic_free_sura = []
for aya in sura:
uthmanic_free_sura.append(filtering.recitation_symbols_filter(aya))
if not with_tashkeel:
return list(map(ar.strip_tashkeel, uthmanic_free_sura))
else:
return uthmanic_free_sura
def fetch_aya(sura_number, aya_number):
"""
Args:
param1 (int): the ordered number of sura in The Mus'haf.
param2 (int): the ordered number of aya in The Mus'haf.
Returns:
str: an aya as a string
"""
message = "Sura number must be an integer between 1 to 114, inclusive."
error.is_int(sura_number, message)
message = "Aya number is a positive integer."
error.is_int(sura_number, message)
aya_number -= 1
sura = get_sura(sura_number)
if aya_number > len(sura) - 1:
raise ValueError('Aya number most not exceed the number of ayat in sura.')
return sura[aya_number]
def retrieve_qruan_as_one_string():
quran_string = ''
for i in range (1, 115):
for aya in get_sura(i, with_tashkeel=True):
quran_string += aya + ' '
return quran_string
def get_sura_number(sura_name):
"""
Args:
sura_name (str) : string represents the sura name.
Returns:
int: the sura number which name is sura_name.
Note:
Do not forget that the index of the returned list starts at zero.
So if the order Sura number is x, then it's at (x-1) in the list.
Example:
```python
q.quran.get_sura_number('الملك')\n
>>> 67
```
"""
suras_list = quran_tree.findall('sura')
suraNumber = None
for index in range (1, 115):
if suras_list[index-1].attrib['name'] == sura_name:
suraNumber = index
return suraNumber
def get_sura_name(sura_number=None):
"""Returns the name of `sura_number`. If `sura_number=None` a list of all
sura's names is retunred.
Args:
sura_number: Optional, 1 <= Integer <= 114, the ordered number of sura in Mushaf.
Returns:
str: the sura name which number is sura_number.
[srt]: list of all suras' names (if the sura_number parameter is None).
Example:
```python
q.quran.get_sura_name(2)\n
>>> 'البقرة'
```
"""
# get all suras
suras_list = quran_tree.findall('sura')
if sura_number is None :
suraName = [(suras_list[i].attrib['name']) for i in range(0,114)]
else:
# get suraName
suraName = suras_list[sura_number-1].attrib['name']
# return suraName
return suraName
# Redandant:
#
def get_verse(sura_number, verse_number, with_tashkeel=False):
"""
get specific verse form specific chapter
Args:
sura_number: 1 <= Integer <= 114, the ordered number of sura in Mushaf.
verse_number: Integer > 0, number of verse.
with_tashkeel: Boolean, if true return sura with tashkeel else return
without.
Returns:
str: a verse.
Example:
```python
q.quran.get_verse(sura_number=1, verse_number=2)\n
>>> 'الحمد لله رب العلمين'
```
"""
if(sura_number > ar.swar_num or verse_number<=0):
return ""
try:
return get_sura(sura_number,with_tashkeel)[verse_number-1]
except:
return ""
================================================
FILE: tools/searchHelper.py
================================================
"""searchHelper: contains helper functions for searching.
"""
from arabic import *
import re
from pyarabic.araby import strip_tashkeel, strip_tatweel
import quran
def count_spaces_before_index(string, index):
"""counts spaces before a char in string.
Args:
param1 (str): string
param2 (int): char index inside string
Returns:
int: number of spaces before string[index]
"""
count = 0
for i in range(index):
if string[i] == ' ':
count += 1
return count
def get_string_taskeel(string):
"""get list of tashkeel without letters
Args:
param1 (str): string
param2 (int): char index inside string
Returns:
list[char]: a list of diacritics found in `straing`
"""
x = ''
for char in string:
if char in tashkeel or char == ' ':
x += char
return x
def hellper_get_sequance_positions(verse,sequance):
'''
this function takes verse and sequence and returns
the position of match word,
and if sequence exists in verse more that one, it
return list of first matched the word.
'''
verse = strip_tashkeel(verse)
sequance = strip_tashkeel(sequance)
sequance = sequance.split()
verse = verse.split()
positions = []
for n,v in enumerate(verse):
if v not in sequance:
continue
for en,se in enumerate(sequance):
if se != verse[n]:
break
if en == len(sequance)-1:
positions.append(n)
n+=1
return positions
def hellper_search_function(verse,sequance,verseNum,chapterNum,mode3):
#split verse to tokens
tokens = re.split(r' ',verse)
if mode3:
verse = strip_tashkeel(verse)
tashkeel_ = "|".join([fatha,fathatan,damma,dammatan
,kasra,kasratan,shadda,sukun])
pattern = r"((\w|["+tashkeel_+"]*)*"+str(sequance)+"(\w|["+tashkeel_+"]*)*)"
#get match_sequance
matches = re.findall(pattern,verse)
matches = [j.strip() for i in matches for j in i if j !='']
#check if found or not
if len(matches)!=0:
try:
new_tokens = verse.split()
positions = dict()
#get position of occuerance
lst = []
if len(sequance.split())>1:
for tok in matches:
positions[tok] = (0,hellper_get_sequance_positions(
verse,tok))
else:
for tok in matches:
if verse.count(tok) > 1:
ls = [i for i,x in enumerate(new_tokens) if x == tok]
positions[tok] = (0,ls)
else:
positions[tok] = (0,[new_tokens.index(tok)])
if chapterNum!=0 and len(sequance.split())==1:
for token in matches:
loc,ls = positions[token]
index = int(ls[loc])
positions[token] = (loc+1,ls)
#check if exist the same token many time
lst.append((tokens[index],
index+1,
verseNum,
chapterNum))
#if matched sequance token
return lst
except:
pass
if len(sequance.split())==1:
#if matched sequance token
for token in matches:
loc,ls = positions[token]
index = int(ls[loc])
positions[token] = (loc+1,ls)
#check if exist the same token many time
lst.append((tokens[index],
index+1))
#if matched sequance token
return lst
else:
#check if mode3 False
if not mode3:
if chapterNum!=0:
#if match sequance sentence
return [(token,0,verseNum,chapterNum) for token in matches]
else:
#if match sequance sentence
return [(token,0) for token in matches]
else:
lst = []
#if match sequance sentence
for token in matches:
new_token = []
loc,ls = positions[token]
index = int(ls[loc])
positions[token] = (loc+1,ls)
new_token = " ".join([str(tokens[index-
len(sequance.split())+i*1+1])
for i in range(len(token.split()))])
if chapterNum!=0:
lst.append((new_token,0,verseNum,chapterNum))
else:
lst.append((new_token,0))
return lst
return []
def hellper_pre_search_sequance(sequance,verse=None,chapterNum=0,
verseNum=0,with_tashkeel=False,mode3=False):
"""
search about sequance in verse or chapter or Quran
and return matched seqance and his position if sequance
was token or sub-token ,and 0 if sequance was sentence.
-cases:
* if found verse as string it will search in verse that entered
* if no chapterNum and no verseNum and no verse it will search
in All Quran.
* if no verseNumber and no verse and found chapterNum it will
search in chapter.
* if found chapterNum and verseNum and no verse it will search
in verse.
Args:
verse (str): it's a verse where function search
sequances (str): a sequance that you want to match it
chapterNum (int) : number of chapter
verseNum (int) : number of verse
with_tashkeel (int) : to check if search with taskeel or not
mode3 (bool) : if true it will us mode 3 to search
Returns:
list of tuble : (matched_sequance ,
his_position ,
verse number ,
chapter number )
Note: position will 0 if matched_sequance was part of sentence,
and will number if matched_sequance was token or sub-token
"""
if verseNum<0 or chapterNum <0 :
return []
#remove extra spaces
sequance = re.sub(r" +"," ",sequance)
sequance = sequance.strip()
#strip tashkeel if with_tashkeel flage is false
if not with_tashkeel:
sequance = strip_tashkeel(sequance)
#search in verse that enterd
if verse != None:
return hellper_search_function(verse,sequance,verseNum,chapterNum,mode3)
else:
#chech if specific chapter
if chapterNum!=0:
#check if specific verse
if verseNum!=0:
verse = quran.get_verse(chapterNum,verseNum,with_tashkeel)
return hellper_search_function(verse,sequance,
verseNum,
chapterNum,
mode3)
else:
#search in Chapter
verses = quran.get_sura(chapterNum,with_tashkeel)
return sum([hellper_search_function(v,sequance,
num+1,chapterNum,
mode3)
for num,v in enumerate(verses)], [])
else:
#search in all Quran
final_list = []
for i in range(swar_num):
verses = quran.get_sura(i+1,with_tashkeel)
final_list += sum([hellper_search_function(v,sequance,
num+1,i+1,
mode3)
for num,v in enumerate(verses)], [])
return final_list
def hellper_frequency_of_chars_in_verse(verse,charaters):
"""
this function count number of characters occurrence in verse
Args:
verse (str): this verse that you need to
count it and default is None.
chracter (list) : list of characters that you want to count them
Returns:
{dic} : a dictionary and keys is a characters and value is count of
every chracter.
"""
#dectionary that have frequency
frequency = dict()
#count frequency of chars
for char in charaters:
frequency[char] = verse.count(char)
return frequency
def hamming_distance(s1, s2):
'''
get number of different character in s1 and s2
'''
return sum(el1 != el2 for el1, el2 in zip(s1, s2))
def get_word_num(char_num,sentece):
'''
take's the position of letter and return the position
of word that has this letter
'''
lis = [len(i) for i in sentece.split()]
coun = 0
for i,l in enumerate(lis):
coun +=l
if char_num <= coun:
return i
def hellper_search_with_pattern(pattern,sentence_pattern,sentence,ratio=1):
'''
this function takes 0's,1's pattern and retuen matched words
from sentence pattern dependent on the ratio to adopt threshold.
Args:
pattern (str): 0's,1's pattern that you need to search.
sentence_pattern (str): 0's,1's pattern of sentence to search inside it.
sentence (str): the real sentence in text format.
ratio (float): threshold of similarity , if 1 it will get the similar exactly,
and if not ,it will get dependant on ratio number.
Return:
[[list]] : it will return list of listes that have matched word, or
matched senteces and return empty list if not found.
'''
sentence_pattern_sequance = sentence_pattern.replace(" ","")
pattern_len = len(pattern)
if pattern_len >len(sentence):
return []
lis = []
s=0
e=pattern_len
i=0
while i <= len(sentence_pattern_sequance)-pattern_len:
sen = sentence_pattern_sequance[s:e]
dif = hamming_distance(sen,pattern)/pattern_len
if 1-dif >= ratio:
matched =sentence.split()[get_word_num(s+1,sentence_pattern):get_word_num(e,sentence_pattern)+1]
matched = " ".join(matched)
if matched not in lis:
lis.append(matched)
s +=1
e +=1
i+=1
return lis
================================================
FILE: tools/shapeHelper.py
================================================
'''shapeHelper: contains helper functions shape.
'''
from arabic import *
from itertools import chain
def searcher(system, ch):
for i in range(0, len(system), 1):
if ch in system[i]:
return i
def convert_text_to_numbers(text,alphabetMap):
"""
convert_text_to_numbers get a text (surah or ayah) and convert it to list of numbers
depends on alphabetMap dictionary , user pass the text "list or list of list" that want to count
and dictionary that has each chat with it's number that will convert to,and returns a list of numbers
What it does:
it convert each letter to a number "corresponding to dictionary given as argument"
Args:
param1 ([str] ): a list of strings , each inner list is ayah .
param2(dict) : a dictionary has each alphabet with it's corresponding number
Returns:
List: list of numbers, where each char in the text converted to number
"""
i=0
textToNumber=[]
for char in text:
textToNumber.insert(i, alphabetMap[char])
i = i + 1
return textToNumber
def check_repetation(system):
diff = len(list(chain(*system)))-len(list(set(chain(*system))))
if diff > 0:
return True
else:
return False