Full Code of lenarsaitov/cianparser for AI

main 236352a200b0 cached

22 files

103.0 KB

28.4k tokens

89 symbols

1 requests

Download .txt

Repository: lenarsaitov/cianparser
Branch: main
Commit: 236352a200b0
Files: 22
Total size: 103.0 KB

Directory structure:
gitextract_5z1o6h1o/

├── .github/
│   └── FUNDING.yml
├── .gitignore
├── LICENSE
├── README.md
├── cianparser/
│   ├── __init__.py
│   ├── base_list.py
│   ├── cianparser.py
│   ├── constants.py
│   ├── definers/
│   │   ├── __init__.py
│   │   ├── definer_cities_id.py
│   │   └── definer_metro_id.py
│   ├── flat/
│   │   ├── list.py
│   │   └── page.py
│   ├── helpers.py
│   ├── newobject/
│   │   ├── list.py
│   │   └── page.py
│   ├── proxy_pool.py
│   ├── suburban/
│   │   ├── list.py
│   │   └── page.py
│   └── url_builder.py
├── setup.cfg
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms

github: [lenarsaitov]
ko_fi: lenarsaitov


================================================
FILE: .gitignore
================================================
/venv/
/build/
/dist/
/cianparser.egg-info/
__pycache__/

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Lenar Saitov

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
### Сбор данных с сайта объявлений об аренде и продаже недвижимости Циан

Cianparser - это библиотека Python 3 (версии 3.8 и выше) для парсинга сайта  [Циан](http://cian.ru).
С его помощью можно получить достаточно подробные и структурированные данные по краткосрочной и долгосрочной аренде, продаже квартир, домов, танхаусов итд.

### Установка
```bash
pip install cianparser
```

### Использование
```python
import cianparser

moscow_parser = cianparser.CianParser(location="Москва")
data = moscow_parser.get_flats(deal_type="sale", rooms=(1, 2), with_saving_csv=True, additional_settings={"start_page":1, "end_page":2})

print(data[0])
```

```
                              Preparing to collect information from pages..
The absolute path to the file: 
 /Users/macbook/some_project/cianparser/cian_flat_sale_1_2_moskva_12_Jan_2024_21_48_43_100892.csv 

The page from which the collection of information begins: 
 https://cian.ru/cat.php?engine_version=2&p=1&with_neighbors=0&region=1&deal_type=sale&offer_type=flat&room1=1&room2=1

Collecting information from pages with list of offers
 1 | 1 page with list: [=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>] 100% | Count of all parsed: 28. Progress ratio: 50 %. Average price: 45 547 801 rub
 2 | 2 page with list: [=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>] 100% | Count of all parsed: 56. Progress ratio: 100 %. Average price: 54 040 102 rub

The collection of information from the pages with list of offers is completed
Total number of parsed offers: 56.
{
    "author": "MR Group",
    "author_type": "developer",
    "url": "https://www.cian.ru/sale/flat/292125772/",
    "location": "Москва",
    "deal_type": "sale",
    "accommodation_type": "flat",
    "floor": 20,
    "floors_count": 37,
    "rooms_count": 1,
    "total_meters": 39.6,
    "price": 28623910,
    "district": "Беговой",
    "street": "Ленинградский проспект",
    "house_number": "вл8",
    "underground": "Белорусская",
    "residential_complex": "Slava"
}
```
### Инициализация
Параметры, используемые при инициализации парсера через функциою CianParser:
* __location__ - локация объявления, к примеру, _Москва_ (для просмотра доступных мест используйте _cianparser.list_locations())_
* __proxies__ - прокси (см раздел __Cloudflare, CloudScraper, Proxy__), по умолчанию _None_

### Метод get_flats
Данный метод принимает следующий аргументы:
* __deal_type__ - тип объявления, к примеру, долгосрочная аренда, продажа _("rent_long", "sale")_
* __rooms__ - количество комнат, к примеру, _1, (1,3, "studio"), "studio, "all"_; по умолчанию любое _("all")_
* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_
* __with_extra_data__ - необходимо ли сбор дополнительных данных, но с кратным продолжительности по времени (см. ниже в __Примечании__), по умолчанию _False_
* __additional_settings__ - дополнительные настройки поиска (см. ниже в __Дополнительные настройки поиска__), по умолчанию _None_

Пример:
```python
import cianparser

moscow_parser = cianparser.CianParser(location="Москва")
data = moscow_parser.get_flats(deal_type="rent_long", rooms=(1, 2), additional_settings={"start_page":1, "end_page":1})
```

В проекте предусмотрен функционал корректного завершения в случае окончания страниц. По данному моменту, следует изучить раздел __Ограничения__

### Метод get_suburban (сбор объявлений домов/участков/танхаусав итп)
Данный метод принимает следующий аргументы:
* __suburban_type__ - тип здания, к примеру, дом/дача, часть дома, участок, танхаус _("house", "house-part", "land-plot", "townhouse")_
* __deal_type__ - тип объявления, к примеру, долгосрочная аренда, продажа _("rent_long", "sale")_
* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_
* __with_extra_data__ - необходимо ли сбор дополнительных данных, но с кратным продолжительности по времени, по умолчанию _False_
* __additional_settings__ - дополнительные настройки поиска (см. ниже в __Дополнительные настройки поиска__), по умолчанию _None_

Пример:
```python
import cianparser

moscow_parser = cianparser.CianParser(location="Москва")
data = moscow_parser.get_suburban(suburban_type="townhouse", deal_type="sale", additional_settings={"start_page":1, "end_page":1})
```

### Метод get_newobjects (сбор даннных по новостройкам)
Данный метод принимает следующий аргументы:
* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_

Пример:
```python
import cianparser

moscow_parser = cianparser.CianParser(location="Москва")
data = moscow_parser.get_newobjects()
```

### Дополнительные настройки поиска
Пример:
```python
additional_settings = {
    "start_page":1,
    "end_page": 10,
    "is_by_homeowner": True,
    "min_price": 1000000,
    "max_price": 10000000,
    "min_balconies": 1,
    "have_loggia": True,
    "min_house_year": 1990,
    "max_house_year": 2023,
    "min_floor": 3,
    "max_floor": 4,
    "min_total_floor": 5,
    "max_total_floor": 10,
    "house_material_type": 1,
    "metro": "Московский",
    "metro_station": "ВДНХ",
    "metro_foot_minute": 45,
    "flat_share": 2,
    "only_flat": True,
    "only_apartment": True,
    "sort_by": "price_from_min_to_max",
}
```
* __object_type__ -  тип жилья ("new" - вторичка, "secondary" - новостройка)
* __start_page__ - страница, с которого начинается сбор данных
* __end_page__ - страница, с которого заканчивается сбор данных
* __is_by_homeowner__ - объявления, созданных только собственниками
* __min_price__ - цена от (в рублях)
* __max_price__ - цена до (в рублях)
* __min_balconies__ - минимальное количество балконов
* __have_loggia__ - наличие лоджи
* __min_house_year__ - год постройки дома от
* __max_house_year__ - год постройки дома до
* __min_floor__ - этаж от
* __max_floor__ - этаж до
* __min_total_floor__ - этажей в доме от
* __max_total_floor__ - этажей в доме до
* __house_material_type__ - тип дома (_см ниже возможные значения_)
* __metro__ - название метрополитена (_см ниже возможные значения_)
* __metro_station__ - станция метро (доступно при заданом metro)
* __metro_foot_minute__ - сколько минут до метро пешком
* __flat_share__ - с долями или без (1 - только доли, 2 - без долей)
* __only_flat__ - без апартаментов
* __only_apartment__ - только апартаменты
* __sort_by__ - сортировка объявлений (_см ниже возможные значения_)

#### Возможные значения поля **house_material_type**
- _1_ - киричный
- _2_ - монолитный
- _3_ - панельный
- _4_ - блочный
- _5_ - деревянный
- _6_ - сталинский
- _7_ - щитовой
- _8_ - кирпично-монолитный

#### Возможные значения полей **metro** и **metro_station**
Соответствуют ключам и значениям словаря, получаемого вызовом функции **_cianparser.list_metro_stations()_**

#### Возможные значения поля **sort_by**
- "_price_from_min_to_max_" - сортировка по цене (сначала дешевле)
- "_price_from_max_to_min_" - сортировка по цене (сначала дороже)
- "_total_meters_from_max_to_min_" - сортировка по общей площади (сначала больше)
- "_creation_data_from_newer_to_older_" - сортировка по дате добавления (сначала новые)
- "_creation_data_from_older_to_newer_" - сортировка по дате добавления (сначала старые)

### Признаки, получаемые в ходе сбора данных с предложений по долгосрочной аренде недвижимости
* __district__ - район
* __underground__ - метро
* __street__ - улица
* __house_number__ - номер дома
* __floor__ - этаж
* __floors_count__ - общее количество этажей
* __total_meters__ - общая площадь
* __living_meters__ - жилая площади
* __kitchen_meters__ - площадь кухни
* __rooms_count__ - количество комнат
* __year_construction__ - год постройки здания
* __house_material_type__ - тип дома (киричный/монолитный/панельный итд)
* __heating_type__ - тип отопления
* __price_per_month__ - стоимость в месяц
* __commissions__ - комиссия, взымаемая при заселении
* __author__ - автор объявления
* __author_type__ - тип автора 
* __phone__ - номер телефона в объявлении
* __url__ - ссылка на объявление

Возможные значения поля __author_type__:
- __real_estate_agent__ - агентство недвижимости
- __homeowner__ - собственник
- __realtor__ - риелтор
- __official_representative__ - ук оф.представитель
- __representative_developer__ - представитель застройщика
- __developer__ - застройщик
- __unknown__ - без указанного типа

### Признаки, получаемые в ходе сбора данных с предложений по продаже недвижимости

Признаки __аналогичны__ вышеописанным, кроме отсутствия полей __price_per_month__ и __commissions__.

При этом появляются новые:
* __price__ - стоимость недвижимости
* __residential_complex__ - название жилого комплекса
* __object_type__ -  тип жилья (вторичка/новостройка)
* __finish_type__ - отделка

### Признаки, получаемые в ходе сбора данных по новостройкам
* __name__ - наименование ЖК
* __url__ - ссылка на страницу
* __full_location_address__ - полный адрес расположения ЖК
* __year_of_construction__ - год сдачи
* __house_material_type__ - тип дома (_см выше возможные значения_)
* __finish_type__ - отделка
* __ceiling_height__ - высота потолка
* __class__ - класс жилья
* __parking_type__ - тип парковки
* __floors_from__ - этажность (от)
* __floors_to__ - этажность (до)
* __builder__ - застройщик

### Сохранение данных
Имеется возможность сохранения собираемых данных в режиме реального времени. Для этого необходимо подставить в аргументе 
__with_saving_csv__ значение ___True___.

#### Пример получаемого файла при вызове метода __get_flats__ с __with_extra_data__ = __True__:

```bash
cian_flat_sale_1_1_moskva_12_Jan_2024_22_29_48_117413.csv
```
| author | author_type | url | location | deal_type | accommodation_type | floor | floors_count | rooms_count | total_meters | price_per_m2 | price | year_of_construction | object_type | house_material_type | heating_type | finish_type | living_meters | kitchen_meters | phone | district | street | house_number | underground | residential_complex
| ------ | ------ | ------ | ------ | ------ | ------ | ----------- | ---- | ---- | --------- | ------------------ | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---
| White and Broughton | real_estate_agent | https://www.cian.ru/sale/flat/290499455/ | Москва | sale | flat | 3 | 40 | 1 | 45.5 | 709890 | 32300000 | 2021 | Вторичка | Монолитный | Центральное | -1 | 19.0 | 6.0 | +79646331510 | Хорошевский | Ленинградский проспект | 37/4 | Динамо | Прайм Парк
| ФСК | developer | https://www.cian.ru/sale/flat/288376323/ | Москва | sale | flat | 24 | 47 | 2 | 46.0 | 528900 | 24329400 | 2024 | Новостройка | Монолитно-кирпичный | -1 | Без отделки, предчистовая, чистовая | 18.0 | 15.0 | +74951387154 | Обручевский |  Академика Волгина | 2С1 | Калужская | Архитектор
| White and Broughton | real_estate_agent | https://www.cian.ru/sale/flat/292416804/ | Москва | sale | flat | 2 | 41 | 2 | 60.0 | 783333 | 47000000 | 2021 | Вторичка | -1 | Центральное | -1 | 43.0 | 5.0 | +79646331510 | Хорошевский | Ленинградский проспект | 37/5 | Динамо | Прайм Парк

#### Пример получаемого файла при вызове метода __get_suburban__ с __with_extra_data__ = __True__:

```bash
cian_suburban_townhouse_sale_15_15_moskva_13_Jan_2024_04_30_47_963046.csv
```
| author | author_type | url | location | deal_type | accommodation_type | price | year_of_construction | house_material_type | land_plot | land_plot_status | heating_type | gas_type | water_supply_type | sewage_system | bathroom | living_meters | floors_count | phone | district | underground | street | house_number
 | -----  | -----  | -----  | -----  | -----  | -----  | -----  | -----  | -----  | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---
| New Moscow House | real_estate_agent | https://www.cian.ru/sale/suburban/296304861/ | Москва | sale | suburban | 93000000 | 2020 | Кирпичный | 13 сот. | -1 | -1 | Есть | Есть | Есть | В доме | -1 | 2 | +79096865868 | Первомайское поселение |  | улица Центральная | 21
| LaRichesse | real_estate_agent | https://www.cian.ru/sale/suburban/290335502/ | Москва | sale | suburban | 95000000 | -1 | Пенобетонный блок | 12 сот. | Индивидуальное жилищное строительство | Центральное | -1 | -1 | -1 | -1 | 502,8 м² | 2 | +79652502027 | Воскресенское поселение |  | улица Каменка | 44Ас1
| Динара Ваганова | realtor | https://www.cian.ru/sale/suburban/293424451/ | Москва | sale | suburban | 21990000 | -1 | -1 | -1 | Индивидуальное жилищное строительство | -1 | Нет | -1 | Нет | -1 | -1 | -1 | +79672093870 | Первомайское поселение | м. Крёкшино |  |

#### Пример получаемого файла при вызове метода __get_newobjects__:

```bash
cian_newobject_13_Jan_2024_01_27_32_734734.csv
```
| name | location | accommodation_type | url | full_location_address | year_of_construction | house_material_type | finish_type | ceiling_height | class | parking_type | floors_from | floors_to | builder
 | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---
| ЖК «SYMPHONY 34 (Симфони 34)» | Москва | newobject | https://zhk-symphony-34-i.cian.ru | Москва, САО, Савеловский, 2-я Хуторская ул., 34 | 2025 | Монолитный | Предчистовая, чистовая | 3,0 м | Премиум | Подземная, гостевая | 36 | 54 | Застройщик MR Group
| ЖК «Коллекция клубных особняков Ильинка 3/8» | Москва | newobject | https://zhk-kollekciya-klubnyh-osobnyakov-ilinka-38-i.cian.ru | Москва, ЦАО, Тверской, ул. Ильинка | 2024 | Монолитно-кирпичный, монолитный | Без отделки | от 3,35 м до 6,0 м | Премиум | Подземная, гостевая | 3 | 5 | Застройщик Sminex-Интеко
| ЖК «Victory Park Residences (Виктори Парк Резиденсез)» | Москва | newobject | https://zhk-victory-park-residences-i.cian.ru | Москва, ЗАО, Дорогомилово, ул. Братьев Фонченко | 2024 | Монолитный | Чистовая | — | Премиум | Подземная | 10 | 11 | Застройщик ANT Development


### Cloudflare, CloudScraper, Proxy
Для обхода блокировки в проекте задействован **CloudScraper** (библиотека **cloudscraper**), который позволяет успешно обходить защиту **Cloudflare**.

Вместе с тем, это не гарантирует отсутствие возможности появления _у некоторых пользователей_ теста **CAPTCHA** при долговременном непрерывном использовании.

#### Proxy
Поэтому была предоставлена возможность проставлять прокси, используя аргумент **proxies** (_список прокси протокола HTTPS_)

Пример:
```python
proxies = [
    '117.250.3.58:8080', 
    '115.96.208.124:8080',
    '152.67.0.109:80', 
    '45.87.68.2:15321', 
    '68.178.170.59:80', 
    '20.235.104.105:3729', 
    '195.201.34.206:80',
]
```

В процессе запуска утилита проходится по всем из них, пытаясь определить подходящий, то есть тот, 
который может, во первых, делать запросы, во вторых, не иметь тест **_CAPTCHA_**

Пример лога, в котором представлено все три возможных кейса

```
The process of checking the proxies... Search an available one among them...
 1 | proxy 46.47.197.210:3128: unavailable.. trying another
 2 | proxy 213.184.153.66:8080: there is captcha.. trying another
 3 | proxy 95.66.138.21:8880: available.. stop searching
```

### Ограничения
Сайт выдает списки с объявлениями <ins>__лишь до 54 странцы включительно__</ins>. Это примерно _28 * 54 = 1512_ объявлений.
Поэтому, если имеется желание собрать как можно больше данных, то следует использовать более конкретные запросы (по количеству комнат). 

К примеру, вместо того, чтобы при использовании указывать _rooms=(1, 2)_, стоит два раза отдельно собирать данные с параметрами _rooms=1_ и _rooms=2_ соответственно.

Таким образом, максимальная разница может составить 1 к 6 (студия, 1, 2, 3, 4, 5 комнатные квартиры), то есть 1512 к 9072.

### Примечание
1. В некоторых объявлениях отсутсвуют данные по некоторым признакам (_год постройки, жилые кв метры, кв метры кухни итп_).
В этом случае проставляется значение ___-1___ либо ___пустая строка___ для числового и строкового типа поля соответственно.

2. Для отсутствия блокировки по __IP__ в данном проекте задана пауза (___в размере 4-5 секунд___) после сбора информации с
каждой отдельной взятой страницы.

3. Не рекомендутся производить несколько процессов сбора данных параллельно (одновременно) на одной машине (см. пункт 2).

4. Имеется флаг __with_extra_data__, при помощи которого можно дополнительно собирать некоторые данные, но при этом существенно (___в 5-10 раз___) замедляется процесс по времени, из-за необходимости заходить на каждую страницу с предложением. 
Соответствующие данные: ___площадь кухни, год постройки здания, тип дома, тип отделки, тип отопления, тип жилья___  и ___номер телефона___.

5. Данный парсер не будет работать в таком инструменте как [Google Colaboratory](https://colab.research.google.com/). 
См. [подробности](https://github.com/lenarsaitov/cianparser/issues/1)

6. Если в проекте не имеется подходящего локации (неожидаемое значение аргумента __location__) или иными словами его нет в списке **_cianparser.list_locations()_**, то прошу сообщить, буду рад добавить.


================================================
FILE: cianparser/__init__.py
================================================
from .cianparser import CianParser, list_locations, list_metro_stations

__author__ = "lenarsaitov"
__mail__ = "lenarsaitov1@yandex.ru"


================================================
FILE: cianparser/base_list.py
================================================
import math
import csv

from cianparser.constants import SPECIFIC_FIELDS_FOR_RENT_LONG, SPECIFIC_FIELDS_FOR_RENT_SHORT, SPECIFIC_FIELDS_FOR_SALE


class BaseListPageParser:
    def __init__(self,
                 session,
                 accommodation_type: str, deal_type: str, rent_period_type, location_name: str,
                 with_saving_csv=False, with_extra_data=False,
                 object_type=None, additional_settings=None):
        self.accommodation_type = accommodation_type
        self.session = session
        self.deal_type = deal_type
        self.rent_period_type = rent_period_type
        self.location_name = location_name
        self.with_saving_csv = with_saving_csv
        self.with_extra_data = with_extra_data
        self.additional_settings = additional_settings
        self.object_type = object_type

        self.result = []
        self.result_set = set()
        self.average_price = 0
        self.count_parsed_offers = 0
        self.start_page = 1 if (additional_settings is None or "start_page" not in additional_settings.keys()) else additional_settings["start_page"]
        self.end_page = 100 if (additional_settings is None or "end_page" not in additional_settings.keys()) else additional_settings["end_page"]
        self.file_path = self.build_file_path()

    def is_sale(self):
        return self.deal_type == "sale"

    def is_rent_long(self):
        return self.deal_type == "rent" and self.rent_period_type == 4

    def is_rent_short(self):
        return self.deal_type == "rent" and self.rent_period_type == 2

    def build_file_path(self):
        pass

    def define_average_price(self, price_data):
        if "price" in price_data:
            self.average_price = (self.average_price * self.count_parsed_offers + price_data["price"]) / self.count_parsed_offers
        elif "price_per_month" in price_data:
            self.average_price = (self.average_price * self.count_parsed_offers + price_data["price_per_month"]) / self.count_parsed_offers

    def print_parse_progress(self, page_number, count_of_pages, offers, ind):
        total_planed_offers = len(offers) * count_of_pages
        print(f"\r {page_number - self.start_page + 1}"
              f" | {page_number} page with list: [" + "=>" * (ind + 1) + "  " * (len(offers) - ind - 1) + "]" + f" {math.ceil((ind + 1) * 100 / len(offers))}" + "%" +
              f" | Count of all parsed: {self.count_parsed_offers}."
              f" Progress ratio: {math.ceil(self.count_parsed_offers * 100 / total_planed_offers)} %."
              f" Average price: {'{:,}'.format(int(self.average_price)).replace(',', ' ')} rub",
              end="\r", flush=True)

    def remove_unnecessary_fields(self):
        if self.is_sale():
            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_LONG:
                if not_need_field in self.result[-1]:
                    del self.result[-1][not_need_field]

            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_SHORT:
                if not_need_field in self.result[-1]:
                    del self.result[-1][not_need_field]

        if self.is_rent_long():
            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_SHORT:
                if not_need_field in self.result[-1]:
                    del self.result[-1][not_need_field]

            for not_need_field in SPECIFIC_FIELDS_FOR_SALE:
                if not_need_field in self.result[-1]:
                    del self.result[-1][not_need_field]

        if self.is_rent_short():
            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_LONG:
                if not_need_field in self.result[-1]:
                    del self.result[-1][not_need_field]

            for not_need_field in SPECIFIC_FIELDS_FOR_SALE:
                if not_need_field in self.result[-1]:
                    del self.result[-1][not_need_field]

        return self.result

    def save_results(self):
        self.remove_unnecessary_fields()
        keys = self.result[0].keys()

        with open(self.file_path, 'w', newline='', encoding='utf-8') as output_file:
            dict_writer = csv.DictWriter(output_file, keys, delimiter=';')
            dict_writer.writeheader()
            dict_writer.writerows(self.result)

================================================
FILE: cianparser/cianparser.py
================================================
import cloudscraper
import time

from cianparser.constants import CITIES, METRO_STATIONS, DEAL_TYPES, OBJECT_SUBURBAN_TYPES
from cianparser.url_builder import URLBuilder
from cianparser.proxy_pool import ProxyPool
from cianparser.flat.list import FlatListPageParser
from cianparser.suburban.list import SuburbanListPageParser
from cianparser.newobject.list import NewObjectListParser


def list_locations():
    return CITIES


def list_metro_stations():
    return METRO_STATIONS


class CianParser:
    def __init__(self, location: str, proxies=None):
        """
        Initialize the Cian website parser
        Examples:
            >>> moscow_parser = cianparser.CianParser(location="Москва")
        :param str location: location. e.g. "Москва", for see all correct values use cianparser.list_locations()
        :param proxies: proxies for executing requests (https scheme), default None
        """

        location_id = __validation_init__(location)

        self.__parser__ = None
        self.__session__ = cloudscraper.create_scraper()
        self.__session__.headers = {'Accept-Language': 'en'}
        self.__proxy_pool__ = ProxyPool(proxies=proxies)
        self.__location_name__ = location
        self.__location_id__ = location_id

    def __set_proxy__(self, url_list):
        if self.__proxy_pool__.is_empty():
            return
        available_proxy = self.__proxy_pool__.get_available_proxy(url_list)
        if available_proxy is not None:
            self.__session__.proxies = {"https": available_proxy}

    def __load_list_page__(self, url_list_format, page_number, attempt_number_exception):
        url_list = url_list_format.format(page_number)
        self.__set_proxy__(url_list)

        if page_number == self.__parser__.start_page and attempt_number_exception == 0:
            print(f"The page from which the collection of information begins: \n {url_list}")

        res = self.__session__.get(url=url_list)
        if res.status_code == 429:
            time.sleep(10)
        res.raise_for_status()

        return res.text

    def __run__(self, url_list_format: str):
        print(f"\n{' ' * 30}Preparing to collect information from pages..")

        if self.__parser__.with_saving_csv:
            print(f"The absolute path to the file: \n{self.__parser__.file_path} \n")

        page_number = self.__parser__.start_page - 1
        end_all_parsing = False
        while page_number < self.__parser__.end_page and not end_all_parsing:
            page_parsed = False
            page_number += 1
            attempt_number_exception = 0

            while attempt_number_exception < 3 and not page_parsed:
                try:
                    (page_parsed, attempt_number, end_all_parsing) = self.__parser__.parse_list_offers_page(
                        html=self.__load_list_page__(url_list_format=url_list_format, page_number=page_number, attempt_number_exception=attempt_number_exception),
                        page_number=page_number,
                        count_of_pages=self.__parser__.end_page + 1 - self.__parser__.start_page,
                        attempt_number=attempt_number_exception)

                except Exception as e:
                    attempt_number_exception += 1
                    if attempt_number_exception < 3:
                        continue
                    print(f"\n\nException: {e}")
                    print(f"The collection of information from the pages with ending parse on {page_number} page...\n")
                    break

        print(f"\n\nThe collection of information from the pages with list of offers is completed")
        print(f"Total number of parsed offers: {self.__parser__.count_parsed_offers}. ", end="\n")

    def get_flats(self, deal_type: str, rooms, with_saving_csv=False, with_extra_data=False, additional_settings=None):
        """
        Parse information of flats from cian website
        Examples:
            >>> moscow_parser = cianparser.CianParser(location="Москва")
            >>> data = moscow_parser.get_flats(deal_type="rent_long", rooms=1)
            >>> data = moscow_parser.get_flats(deal_type="rent_short", rooms=(1,3,"studio"), with_saving_csv=True)
            >>> data = moscow_parser.get_flats(deal_type="sale", additional_settings={"start_page": 1, "end_page": 1, "sort_by":"price_from_min_to_max"})
        :param deal_type: type of deal, e.g. "rent_long", "rent_short", "sale"
        :param rooms: how many rooms in accommodation, default "all". Example 1, (1,3, "studio"), "studio, "all"
        :param with_saving_csv: is it necessary to save data in csv, default False
        :param with_extra_data:  is it necessary to collect additional data (but with increasing time duration), default False
        :param additional_settings:  additional settings such as min_price, sort_by and others, default None
        """

        __validation_get_flats__(deal_type, rooms)
        deal_type, rent_period_type = __define_deal_type__(deal_type)
        self.__parser__ = FlatListPageParser(
            session=self.__session__,
            accommodation_type="flat",
            deal_type=deal_type,
            rent_period_type=rent_period_type,
            location_name=self.__location_name__,
            with_saving_csv=with_saving_csv,
            with_extra_data=with_extra_data,
            additional_settings=additional_settings,
        )
        self.__run__(
            __build_url_list__(location_id=self.__location_id__, deal_type=deal_type, accommodation_type="flat",
                               rooms=rooms, rent_period_type=rent_period_type,
                               additional_settings=additional_settings))
        return self.__parser__.result

    def get_suburban(self, suburban_type: str, deal_type: str, with_saving_csv=False, with_extra_data=False, additional_settings=None):
        """
        Parse information of suburbans from cian website
        Examples:
            >>> moscow_parser = cianparser.CianParser(location="Москва")
            >>> data = moscow_parser.get_suburbans(suburban_type="house",deal_type="rent_long")
            >>> data = moscow_parser.get_suburbans(suburban_type="house",deal_type="rent_short", with_saving_csv=True)
            >>> data = moscow_parser.get_suburbans(suburban_type="townhouse",deal_type="sale", additional_settings={"start_page": 1, "end_page": 1, "sort_by":"price_from_min_to_max"})
        :param suburban_type: type of suburban building, e.g. "house", "house-part", "land-plot", "townhouse"
        :param deal_type: type of deal, e.g. "rent_long", "rent_short", "sale"
        :param with_saving_csv: is it necessary to save data in csv, default False
        :param with_extra_data:  is it necessary to collect additional data (but with increasing time duration), default False
        :param additional_settings:  additional settings such as min_price, sort_by and others, default None
        """

        __validation_get_suburban__(suburban_type=suburban_type, deal_type=deal_type)
        deal_type, rent_period_type = __define_deal_type__(deal_type)
        self.__parser__ = SuburbanListPageParser(
            session=self.__session__,
            accommodation_type="suburban",
            deal_type=deal_type,
            rent_period_type=rent_period_type,
            location_name=self.__location_name__,
            with_saving_csv=with_saving_csv,
            with_extra_data=with_extra_data,
            additional_settings=additional_settings,
            object_type=suburban_type,
        )
        self.__run__(
            __build_url_list__(location_id=self.__location_id__, deal_type=deal_type, accommodation_type="suburban",
                               rooms=None, rent_period_type=rent_period_type, suburban_type=suburban_type,
                               additional_settings=additional_settings))
        return self.__parser__.result

    def get_newobjects(self, with_saving_csv=False):
        """
        Parse information of newobjects from cian website
        Examples:
            >>> moscow_parser = cianparser.CianParser(location="Москва")
            >>> data = moscow_parser.get_newobjects(with_saving_csv=True)
        :param with_saving_csv: is it necessary to save data in csv, default False
        """

        self.__parser__ = NewObjectListParser(
            session=self.__session__,
            location_name=self.__location_name__,
            with_saving_csv=with_saving_csv,
        )
        self.__run__(
            __build_url_list__(location_id=self.__location_id__, deal_type="sale", accommodation_type="newobject"))
        return self.__parser__.result


def __validation_init__(location):
    location_id = None
    for location_info in list_locations():
        if location_info[0] == location:
            location_id = location_info[1]

    if location_id is None:
        ValueError(f'You entered {location}, which is not exists in base.'
                   f' See all available values of location in cianparser.list_locations()')

    return location_id


def __validation_get_flats__(deal_type, rooms):
    if deal_type not in DEAL_TYPES:
        raise ValueError(f'You entered deal_type={deal_type}, which is not valid value. '
                         f'Try entering one of these values: "rent_long", "sale".')

    if type(rooms) is tuple:
        for count_of_room in rooms:
            if type(count_of_room) is int:
                if count_of_room < 1 or count_of_room > 5:
                    raise ValueError(f'You entered {count_of_room} in {rooms}, which is not valid value. '
                                     f'Try entering one of these values: 1, 2, 3, 4, 5, "studio", "all".')
            elif type(count_of_room) is str:
                if count_of_room != "studio":
                    raise ValueError(f'You entered {count_of_room} in {rooms}, which is not valid value. '
                                     f'Try entering one of these values: 1, 2, 3, 4, 5, "studio", "all".')
            else:
                raise ValueError(f'In tuple "rooms" not valid type of element. '
                                 f'It is correct int and str types. Example (1,3,5, "studio").')
    elif type(rooms) is int:
        if rooms < 1 or rooms > 5:
            raise ValueError(f'You entered rooms={rooms}, which is not valid value. '
                             f'Try entering one of these values: 1, 2, 3, 4, 5, "studio", "all".')
    elif type(rooms) is str:
        if rooms != "studio" and rooms != "all":
            raise ValueError(f'You entered rooms={rooms}, which is not valid value. '
                             f'Try entering one of these values: 1, 2, 3, 4, 5, "studio", "all".')
    else:
        raise ValueError(f'In argument "rooms" not valid type of element. '
                         f'It is correct int, str and tuple types. Example 1, (1,3, "studio"), "studio, "all".')


def __validation_get_suburban__(suburban_type, deal_type):
    if suburban_type not in OBJECT_SUBURBAN_TYPES.keys():
        raise ValueError(f'You entered suburban_type={suburban_type}, which is not valid value. '
                         f'Try entering one of these values: "house", "house-part", "land-plot", "townhouse".')

    if deal_type not in DEAL_TYPES:
        raise ValueError(f'You entered deal_type={deal_type}, which is not valid value. '
                         f'Try entering one of these values: "rent_long", "sale".')


def __build_url_list__(location_id, deal_type, accommodation_type, rooms=None, rent_period_type=None,
                       suburban_type=None, additional_settings=None):
    url_builder = URLBuilder(accommodation_type == "newobject")
    url_builder.add_location(location_id)
    url_builder.add_deal_type(deal_type)
    url_builder.add_accommodation_type(accommodation_type)

    if rooms is not None:
        url_builder.add_room(rooms)

    if rent_period_type is not None:
        url_builder.add_rent_period_type(rent_period_type)

    if suburban_type is not None:
        url_builder.add_object_suburban_type(suburban_type)

    if additional_settings is not None:
        url_builder.add_additional_settings(additional_settings)

    return url_builder.get_url()


def __define_deal_type__(deal_type):
    rent_period_type = None
    if deal_type == "rent_long":
        deal_type, rent_period_type = "rent", 4
    elif deal_type == "rent_short":
        deal_type, rent_period_type = "rent", 2
    return deal_type, rent_period_type


================================================
FILE: cianparser/constants.py
================================================
DEAL_TYPES = {"rent_long", "sale"}
OBJECT_SUBURBAN_TYPES = {"house": "1", "house-part": "2", "land-plot": "3", "townhouse": "4"}
OBJECT_TYPES = {"secondary": "1", "new": "2"}

# DEAL_TYPES_NOT_IMPLEMENTED_YET = {"rent_short"}

# ACCOMMODATION_TYPES_NOT_IMPLEMENTED_YET = {"room", "house", "house-part", "townhouse"}

FLOATS_NUMBERS_REG_EXPRESSION = r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"

FILE_NAME_FLAT_FORMAT = 'cian_{}_{}_{}_{}_{}_{}.csv'
FILE_NAME_SUBURBAN_FORMAT = 'cian_{}_{}_{}_{}_{}_{}_{}.csv'
FILE_NAME_NEWOBJECT_FORMAT = 'cian_{}_{}_{}.csv'

BASE_URL = "https://cian.ru"
DEFAULT_POSTFIX_PATH = "/cat.php?"
NEWOBJECT_POSTFIX_PATH = "/newobjects/list/?"
DEFAULT_PATH = "engine_version=2&p={}&with_neighbors=0"
REGION_PATH = "&region={}"
OFFER_TYPE_PATH = "&offer_type={}"
RENT_PERIOD_TYPE_PATH = "&type={}"
DEAL_TYPE_PATH = "&deal_type={}"
OBJECT_TYPE_PATH = "&object_type%5B0%5D={}"

ROOM_PATH = "&room{}=1"
STUDIO_PATH = "&room9=1"
IS_ONLY_HOMEOWNER_PATH = "&is_by_homeowner=1"
MIN_BALCONIES_PATH = "&min_balconies={}"
HAVE_LOGGIA_PATH = "&loggia=1"
MIN_HOUSE_YEAR_PATH = "&min_house_year={}"
MAX_HOUSE_YEAR_PATH = "&max_house_year={}"
MIN_PRICE_PATH = "&minprice={}"
MAX_PRICE_PATH = "&maxprice={}"
MIN_FLOOR_PATH = "&minfloor={}"
MAX_FLOOR_PATH = "&maxfloor={}"
MIN_TOTAL_FLOOR_PATH = "&minfloorn={}"
MAX_TOTAL_FLOOR_PATH = "&maxfloorn={}"

HOUSE_MATERIAL_TYPE_PATH = "&house_material%5B0%5D={}"

METRO_FOOT_MINUTE_PATH = "&only_foot=2&foot_min={}"
METRO_ID_PATH = "&metro%5B0%5D={}"

FLAT_SHARE_PATH = "&flat_share={}"
ONLY_FLAT_PATH = "&only_flat={}"
APARTMENT_PATH = "&apartment={}"

SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH = "&sort=price_object_order"
SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH = "&sort=total_price_desc"
SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH = "&sort=area_order"
SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH = "&sort=creation_date_desc"
SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH = "&sort=creation_date_asc"

IS_SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH = "price_from_min_to_max"
IS_SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH = "price_from_max_to_min"
IS_SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH = "total_meters_from_max_to_min"
IS_SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH = "creation_data_from_newer_to_older"
IS_SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH = "creation_data_from_older_to_newer"

NOT_STREET_ADDRESS_ELEMENTS = {"ЖК", "м.", "мкр.", "Жилой комплекс", "Жилой Комплекс"}

STREET_TYPES = {"ул.", "улица", "аллея", "бульвар", "линия", "набережная", "тракт", "тупик", "шоссе", "переулок",
                "проспект", "проезд", "раздъезд", "мост", "авеню"}

SPECIFIC_FIELDS_FOR_RENT_LONG = {"price_per_month", "commissions"}
SPECIFIC_FIELDS_FOR_RENT_SHORT = {"price_per_day"}
SPECIFIC_FIELDS_FOR_SALE = {"price", "residential_complex", "object_type", "finish_type"}

CITIES = [
    ['Москва', '1'],
    ['Санкт-Петербург', '2'],
    ['Абакан', '4638'],
    ['Анадырь', '4648'],
    ['Архангельск', '4658'],
    ['Астрахань', '4660'],
    ['Барнаул', '4668'],
    ['Белгород', '4671'],
    ['Биробиджан', '4682'],
    ['Благовещенск', '4683'],
    ['Бронницы', '4690'],
    ['Брянск', '4691'],
    ['Великий Новгород', '4694'],
    ['Владивосток', '4701'],
    ['Владикавказ', '4702'],
    ['Владимир', '4703'],
    ['Волгоград', '4704'],
    ['Вологда', '4708'],
    ['Воронеж', '4713'],
    ['Геленджик', '4717'],
    ['Горно-Алтайск', '4719'],
    ['Грозный', '4723'],
    ['Дзержинский', '4734'],
    ['Долгопрудный', '4738'],
    ['Дубна', '4741'],
    ['Екатеринбург', '4743'],
    ['Жуковский', '4750'],
    ['Звенигород', '4756'],
    ['Иванов', '4767'],
    ['Ижевск', '4770'],
    ['Иркутск', '4774'],
    ['Йошкар-Ола', '4776'],
    ['Казань', '4777'],
    ['Калининград', '4778'],
    ['Калуга', '4780'],
    ['Кемерово', '4795'],
    ['Киров', '4800'],
    ['Коломна', '4809'],
    ['Королёв', '4813'],
    ['Красноармейск', '4817'],
    ['Краснодар', '4820'],
    ['Краснознаменск', '4822'],
    ['Красноярск', '4827'],
    ['Курган', '4834'],
    ['Курск', '4835'],
    ['Кызыл', '4837'],
    ['Липецк', '4847'],
    ['Лобня', '4848'],
    ['Лыткарино', '4851'],
    ['Магадан', '4852'],
    ['Майкоп', '4855'],
    ['Махачкала', '4857'],
    ['Мурманск', '4871'],
    ['Нальчик', '4875'],
    ['Нарьян-Мар', '4876'],
    ['Нижний Новгород', '4885'],
    ['Новороссийск', '4896'],
    ['Новокузнецк', '4894'],
    ['Новосибирск', '4897'],
    ['Омск', '4914'],
    ['Оренбург', '4915'],
    ['Орехово-Зуево', '4916'],
    ['Пенза', '4923'],
    ['Пермь', '4927'],
    ['Петрозаводск', '4930'],
    ['Петропавловск-Камчатский', '4931'],
    ['Подольск', '4935'],
    ['Протвино', '4945'],
    ['Псков', '4946'],
    ['Пущино', '4949'],
    ['Реутов', '4958'],
    ['Ростов-на-Дону', '4959'],
    ['Рошаль', '4960'],
    ['Рязань', '4963'],
    ['Салехард', '4965'],
    ['Самара', '4966'],
    ['Саранск', '4967'],
    ['Саратов', '4969'],
    ['Серпухов', '4983'],
    ['Смоленск', '4987'],
    ['Сочи', '4998'],
    ['Ставрополь', '5001'],
    ['Сургут', '5003'],
    ['Сыктывкар', '5006'],
    ['Тамбов', '5011'],
    ['Тольятти', '5015'],
    ['Томск', '5016'],
    ['Тула', '5020'],
    ['Тюмень', '5024'],
    ['Улан-Удэ', '5026'],
    ['Ульяновск', '5027'],
    ['Фрязино', '5038'],
    ['Хабаровск', '5039'],
    ['Ханты-Мансийск', '5041'],
    ['Химки', '5044'],
    ['Чебоксары', '5047'],
    ['Челябинск', '5048'],
    ['Череповец', '5050'],
    ['Черкесск', '5051'],
    ['Чита', '5053'],
    ['Электросталь', '5064'],
    ['Элиста', '5065'],
    ['Южно-Сахалинск', '5069'],
    ['Якутск', '5073'],
    ['Ярославль', '5075'],
]

OTHER_CITIES = [
    ['Азов', '174136'],
    ['Аксай', '174151'],
    ['Альметьевск', '174184'],
    ['Анапа', '174191'],
    ['Балашиха', '174292'],
    ['Бокситогорск', '174373'],
    ['Бора', '174402'],
    ['Видное', '174508'],
    ['Волоколамск', '174522'],
    ['Воскресенск', '174530'],
    ['Высоковск', '174541'],
    ['Голицын', '174573'],
    ['Дмитров', '174634'],
    ['Домодедово', '174640'],
    ['Дрезна', '174644'],
    ['Егорьевск', '174659'],
    ['Истра', '174832'],
    ['Кашира', '174957'],
    ['Клин', '175004'],
    ['Кострома', '175050'],
    ['Котельник', '175051'],
    ['Красногорск', '175071'],
    ['Краснозаводск', '175075'],
    ['Кубинка', '175104'],
    ['Ликино-Дулёво', '175209'],
    ['Лосино-Петровский', '175219'],
    ['Луховицы', '175226'],
    ['Люберцы', '175231'],
    ['Можайск', '175349'],
    ['Мытищи', '175378'],
    ['Набережные Челны', '175380'],
    ['Назрань', '175389'],
    ['Одинцово', '175578'],
    ['Орёл', '175604'],
    ['Павловский Посад', '175635'],
    ['Пушкин', '175744'],
    ['Раменское', '175758'],
    ['Руза', '175785'],
    ['Сергиевом Посад', '175864'],
    ['Солнечногорск', '175903'],
    ['Ступино', '175996'],
    ['Талдом', '176052'],
    ['Тверь', '176083'],
    ['Уфа', '176245'],
    ['Хотьково', '176281'],
    ['Черноголовка', '176316'],
    ['Чехов', '176321'],
    ['Шатура', '176366'],
    ['Щёлково', '176401'],
    ['Электрогорск', '176405'],
    ['Яхрома', '176463'],
]

CITIES.extend(OTHER_CITIES)

METRO_STATIONS = {
    "Московский": [
        ['Авиамоторная', '1'],
        ['Автозаводская', '2'],
        ['Академическая', '3'],
        ['Александровский сад', '4'],
        ['Алексеевская', '5'],
        ['Алтуфьево', '6'],
        ['Аннино', '7'],
        ['Арбатская', '8'],
        ['Аэропорт', '9'],
        ['Бабушкинская', '10'],
        ['Багратионовская', '11'],
        ['Баррикадная', '12'],
        ['Бауманская', '13'],
        ['Беговая', '14'],
        ['Белорусская', '15'],
        ['Беляево', '16'],
        ['Бибирево', '17'],
        ['Библиотека им. Ленина', '18'],
        ['Новоясеневская', '19'],
        ['Боровицкая', '20'],
        ['Ботанический сад', '21'],
        ['Братиславская', '22'],
        ['Бульвар Адмирала Ушакова', '23'],
        ['Бульвар Дмитрия Донского', '24'],
        ['Бунинская аллея', '25'],
        ['Варшавская', '26'],
        ['ВДНХ', '27'],
        ['Владыкино', '28'],
        ['Водный стадион', '29'],
        ['Войковская', '30'],
        ['Волгоградский проспект', '31'],
        ['Волжская', '32'],
        ['Воробьёвы горы', '33'],
        ['Выхино', '34'],
        ['Выставочная', '35'],
        ['Динамо', '36'],
        ['Дмитровская', '37'],
        ['Добрынинская', '38'],
        ['Домодедовская', '39'],
        ['Дубровка', '40'],
        ['Измайловская', '41'],
        ['Калужская', '42'],
        ['Кантемировская', '43'],
        ['Каховская', '44'],
        ['Каширская', '45'],
        ['Киевская', '46'],
        ['Китай-город', '47'],
        ['Кожуховская', '48'],
        ['Коломенская', '49'],
        ['Комсомольская', '50'],
        ['Коньково', '51'],
        ['Красногвардейская', '52'],
        ['Красносельская', '53'],
        ['Красные ворота', '54'],
        ['Крестьянская застава', '55'],
        ['Кропоткинская', '56'],
        ['Крылатское', '57'],
        ['Кузнецкий мост', '58'],
        ['Кузьминки', '59'],
        ['Кунцевская', '60'],
        ['Курская', '61'],
        ['Кутузовская', '62'],
        ['Ленинский проспект', '63'],
        ['Лубянка', '64'],
        ['Люблино', '65'],
        ['Марксистская', '66'],
        ['Марьино', '67'],
        ['Маяковская', '68'],
        ['Медведково', '69'],
        ['Международная', '70'],
        ['Менделеевская', '71'],
        ['Молодёжная', '72'],
        ['Нагатинская', '73'],
        ['Нагорная', '74'],
        ['Нахимовский проспект', '75'],
        ['Новогиреево', '76'],
        ['Новокузнецкая', '77'],
        ['Новослободская', '78'],
        ['Новые Черёмушки', '79'],
        ['Октябрьская', '80'],
        ['Октябрьское поле', '81'],
        ['Орехово', '82'],
        ['Отрадное', '83'],
        ['Охотный ряд', '84'],
        ['Павелецкая', '85'],
        ['Парк Культуры', '86'],
        ['Парк Победы', '87'],
        ['Партизанская', '88'],
        ['Первомайская', '89'],
        ['Перово', '90'],
        ['Петровско-Разумовская', '91'],
        ['Печатники', '92'],
        ['Пионерская', '93'],
        ['Планерная', '94'],
        ['Площадь Ильича', '95'],
        ['Площадь Революции', '96'],
        ['Полежаевская', '97'],
        ['Полянка', '98'],
        ['Пражская', '99'],
        ['Преображенская площадь', '100'],
        ['Пролетарская', '101'],
        ['Проспект Вернадского', '102'],
        ['Проспект Мира', '103'],
        ['Профсоюзная', '104'],
        ['Пушкинская', '105'],
        ['Речной вокзал', '106'],
        ['Рижская', '107'],
        ['Римская', '108'],
        ['Рязанский проспект', '109'],
        ['Савёловская', '110'],
        ['Свиблово', '111'],
        ['Севастопольская', '112'],
        ['Семёновская', '113'],
        ['Серпуховская', '114'],
        ['Смоленская', '115'],
        ['Сокол', '116'],
        ['Сокольники', '117'],
        ['Спортивная', '118'],
        ['Сретенский бульвар', '119'],
        ['Студенческая', '120'],
        ['Сухаревская', '121'],
        ['Сходненская', '122'],
        ['Таганская', '123'],
        ['Тверская', '124'],
        ['Театральная', '125'],
        ['Текстильщики', '126'],
        ['Тёплый Стан', '127'],
        ['Тимирязевская', '128'],
        ['Третьяковская', '129'],
        ['Трубная', '130'],
        ['Тульская', '131'],
        ['Тургеневская', '132'],
        ['Тушинская', '133'],
        ['Улица 1905 года', '134'],
        ['Улица Академика Янгеля', '135'],
        ['Улица Горчакова', '136'],
        ['Бульвар Рокоссовского', '137'],
        ['Улица Скобелевская', '138'],
        ['Улица Старокачаловская', '139'],
        ['Университет', '140'],
        ['Филёвский парк', '141'],
        ['Фили', '142'],
        ['Фрунзенская', '143'],
        ['Царицыно', '144'],
        ['Цветной бульвар', '145'],
        ['Черкизовская', '146'],
        ['Чертановская', '147'],
        ['Чеховская', '148'],
        ['Чистые пруды', '149'],
        ['Чкаловская', '150'],
        ['Шаболовская', '151'],
        ['Шоссе Энтузиастов', '152'],
        ['Щёлковская', '153'],
        ['Щукинская', '154'],
        ['Электрозаводская', '155'],
        ['Юго-Западная', '156'],
        ['Южная', '157'],
        ['Ясенево', '158'],
        ['Краснопресненская', '159'],
        ['Строгино', '228'],
        ['Славянский бульвар', '229'],
        ['Мякинино', '233'],
        ['Волоколамская', '234'],
        ['Митино', '235'],
        ['Марьина Роща', '236'],
        ['Шипиловская', '238'],
        ['Зябликово', '239'],
        ['Борисово', '240'],
        ['Новокосино', '243'],
        ['Пятницкое шоссе', '244'],
        ['Алма-Атинская', '245'],
        ['Жулебино', '270'],
        ['Лермонтовский проспект', '271'],
        ['Деловой центр', '272'],
        ['Лесопарковая', '273'],
        ['Битцевский парк', '274'],
        ['Спартак', '275'],
        ['Улица Сергея Эйзенштейна', '276'],
        ['Выставочный центр', '277'],
        ['Улица Академика Королёва', '278'],
        ['Телецентр', '279'],
        ['Улица Милашенкова', '280'],
        ['Тропарёво', '281'],
        ['Котельники', '282'],
        ['Технопарк', '283'],
        ['Румянцево', '284'],
        ['Саларьево', '285'],
        ['Фонвизинская', '286'],
        ['Бутырская', '287'],
        ['Хорошёво', '289'],
        ['Зорге', '290'],
        ['Панфиловская', '291'],
        ['Стрешнево', '292'],
        ['Балтийская', '293'],
        ['Коптево', '294'],
        ['Лихоборы', '295'],
        ['Окружная', '296'],
        ['Ростокино', '297'],
        ['Белокаменная', '298'],
        ['Локомотив', '299'],
        ['Измайлово', '300'],
        ['Соколиная гора', '301'],
        ['Андроновка', '302'],
        ['Нижегородская', '303'],
        ['Новохохловская', '304'],
        ['Угрешская', '305'],
        ['ЗИЛ', '306'],
        ['Верхние котлы', '307'],
        ['Крымская', '308'],
        ['Площадь Гагарина', '309'],
        ['Лужники', '310'],
        ['Шелепиха', '311'],
        ['Минская', '337'],
        ['Ломоносовский проспект', '338'],
        ['Раменки', '339'],
        ['Ховрино', '349'],
        ['Петровский Парк', '350'],
        ['Хорошёвская', '351'],
        ['ЦСКА', '352'],
        ['Верхние Лихоборы', '353'],
        ['Селигерская', '354'],
        ['Мичуринский проспект', '361'],
        ['Озёрная', '362'],
        ['Говорово', '363'],
        ['Солнцево', '364'],
        ['Боровское шоссе', '365'],
        ['Новопеределкино', '366'],
        ['Рассказовка', '367'],
        ['Беломорская', '369'],
        ['Косино', '370'],
        ['Улица Дмитриевского', '371'],
        ['Лухмановская', '372'],
        ['Некрасовка', '373'],
        ['Юго-Восточная', '374'],
        ['Окская', '375'],
        ['Стахановская', '376'],
        ['Филатов Луг', '377'],
        ['Прокшино', '378'],
        ['Ольховая', '379'],
        ['Коммунарка', '380'],
        ['Лефортово', '381'],
        ['Шереметьевская', '383'],
        ['Рижская', '384'],
        ['Сокольники', '385'],
        ['Электрозаводская', '386'],
        ['Кленовый бульвар', '387'],
        ['Нагатинский Затон', '388'],
        ['Зюзино', '389'],
        ['Воронцовская', '390'],
        ['Новаторская', '391'],
        ['Аминьевская', '392'],
        ['Давыдково', '393'],
        ['Кунцевская', '394'],
        ['Мнёвники', '395'],
        ['Терехово ', '396'],
        ['Карамышевская', '397'],
        ['Яхромская', '398'],
        ['Лианозово', '399'],
        ['Тестовская', '400'],
        ['Рабочий посёлок', '401'],
        ['Сетунь', '402'],
        ['Немчиновка', '403'],
        ['Сколково', '404'],
        ['Баковка', '405'],
        ['Одинцово', '406'],
        ['Лобня', '407'],
        ['Хлебниково', '408'],
        ['Водники', '409'],
        ['Долгопрудная', '410'],
        ['Новодачная', '411'],
        ['Марк', '412'],
        ['Бескудниково', '413'],
        ['Дегунино', '414'],
        ['Нахабино', '415'],
        ['Аникеевка', '416'],
        ['Опалиха', '417'],
        ['Красногорская', '418'],
        ['Павшино', '419'],
        ['Пенягино', '420'],
        ['Трикотажная', '421'],
        ['Стрешнево', '422'],
        ['Красный Балтиец', '423'],
        ['Гражданская', '424'],
        ['Москва-Товарная', '425'],
        ['Калитники', '426'],
        ['Люблино', '427'],
        ['Депо', '428'],
        ['Перерва', '429'],
        ['Москворечье', '430'],
        ['Покровское', '431'],
        ['Красный Строитель', '432'],
        ['Битца', '433'],
        ['Щербинка', '434'],
        ['Силикатная', '435'],
        ['Подольск', '436'],
        ['Бутово', '437'],
        ['Остафьево', '438'],
        ['Курьяново', '439'],
        ['Народное Ополчение', '440'],
        ['Площадь трёх вокзалов', '441'],
        ['Авиамоторная', '443'],
        ['Деловой центр', '444'],
        ['Каширская', '445'],
        ['Лефортово', '446'],
        ['Мичуринский проспект', '447'],
        ['Нижегородская', '448'],
        ['Печатники', '449'],
        ['Проспект Вернадского', '450'],
        ['Савёловская', '451'],
        ['Текстильщики', '452'],
        ['Шелепиха', '453'],
        ['Марьина Роща', '454'],
        ['Зеленоград — Крюково', '455'],
        ['Фирсановская', '456'],
        ['Сходня', '457'],
        ['Подрезково', '458'],
        ['Новоподрезково', '459'],
        ['Молжаниново', '460'],
        ['Химки', '461'],
        ['Левобережная', '462'],
        ['Ховрино', '463'],
        ['Грачёвская', '464'],
        ['Моссельмаш', '465'],
        ['Лихоборы', '466'],
        ['Петровско-Разумовская', '467'],
        ['Останкино', '468'],
        ['Электрозаводская', '470'],
        ['Сортировочная', '471'],
        ['Андроновка', '473'],
        ['Перово', '474'],
        ['Плющево', '475'],
        ['Вешняки', '476'],
        ['Выхино', '477'],
        ['Рязанский проспект', '478'],
        ['Ухтомская', '479'],
        ['Люберцы', '480'],
        ['Панки', '481'],
        ['Томилино', '482'],
        ['Красково', '483'],
        ['Котельники', '484'],
        ['Отдых', '488'],
        ['Кратово', '489'],
        ['Есенинская', '490'],
        ['Фабричная', '491'],
        ['Раменское', '492'],
        ['Ипподром', '493'],
        ['Апрелевка', '494'],
        ['Победа', '495'],
        ['Крёкшино', '496'],
        ['Санино', '497'],
        ['Кокошкино', '498'],
        ['Толстопальцево', '499'],
        ['Лесной Городок', '500'],
        ['Внуково', '501'],
        ['Мичуринец', '502'],
        ['Переделкино', '503'],
        ['Солнечная', '504'],
        ['Говорово', '505'],
        ['Очаково', '506'],
        ['Аминьевская', '507'],
        ['Матвеевская', '508'],
        ['Минская', '509'],
        ['Кутузовская', '511'],
        ['Беговая', '513'],
        ['Белорусская', '514'],
        ['Рижская', '517'],
        ['Курская', '519'],
        ['Чухлинка', '522'],
        ['Кусково', '523'],
        ['Новогиреево', '524'],
        ['Реутов', '525'],
        ['Никольское', '526'],
        ['Салтыковская', '527'],
        ['Кучино', '528'],
        ['Ольгино', '529'],
        ['Железнодорожная', '530'],
        ['Физтех', '533'],
        ['Аэропорт Внуково', '535'],
        ['Пыхтино', '536'],
        ['Марьина Роща', '537'],
    ],
    "Казанский": [
        ['Северный Вокзал', '314'],
        ['Яшьлек', '315'],
        ['Козья слобода', '316'],
        ['Кремлёвская', '317'],
        ['Площадь Тукая', '318'],
        ['Суконная слобода', '319'],
        ['Аметьево', '320'],
        ['Горки', '321'],
        ['Проспект Победы', '322'],
        ['Дубравная', '368'],
    ],
    "Петербургский": [
        ['Девяткино', '167'],
        ['Гражданский проспект', '168'],
        ['Академическая', '169'],
        ['Политехническая', '170'],
        ['Площадь Мужества', '171'],
        ['Лесная', '172'],
        ['Выборгская', '173'],
        ['Площадь Ленина', '174'],
        ['Чернышевская', '175'],
        ['Площадь Восстания', '176'],
        ['Владимирская', '177'],
        ['Пушкинская', '178'],
        ['Технологический институт', '179'],
        ['Балтийская', '180'],
        ['Нарвская', '181'],
        ['Кировский завод', '182'],
        ['Автово', '183'],
        ['Ленинский проспект', '184'],
        ['Проспект Ветеранов', '185'],
        ['Парнас', '186'],
        ['Проспект Просвещения', '187'],
        ['Озерки', '188'],
        ['Удельная', '189'],
        ['Пионерская', '190'],
        ['Черная речка', '191'],
        ['Петроградская', '192'],
        ['Горьковская', '193'],
        ['Невский проспект', '194'],
        ['Сенная площадь', '195'],
        ['Фрунзенская', '197'],
        ['Московские ворота', '198'],
        ['Электросила', '199'],
        ['Парк Победы', '200'],
        ['Московская', '201'],
        ['Звездная', '202'],
        ['Купчино', '203'],
        ['Приморская', '204'],
        ['Василеостровская', '205'],
        ['Гостиный двор', '206'],
        ['Маяковская', '207'],
        ['Площадь Александра Невского', '208'],
        ['Елизаровская', '210'],
        ['Ломоносовская', '211'],
        ['Пролетарская', '212'],
        ['Обухово', '213'],
        ['Рыбацкое', '214'],
        ['Комендантский проспект', '215'],
        ['Старая Деревня', '216'],
        ['Крестовский остров', '217'],
        ['Чкаловская', '218'],
        ['Спортивная', '219'],
        ['Садовая', '220'],
        ['Достоевская', '221'],
        ['Лиговский проспект', '222'],
        ['Новочеркасская', '224'],
        ['Ладожская', '225'],
        ['Проспект Большевиков', '226'],
        ['Улица Дыбенко', '227'],
        ['Волковская', '230'],
        ['Звенигородская', '231'],
        ['Спасская', '232'],
        ['Обводный канал', '241'],
        ['Адмиралтейская', '242'],
        ['Международная', '246'],
        ['Бухарестская', '247'],
        ['Проспект Славы', '357'],
        ['Беговая', '355'],
        ['Зенит', '356'],
        ['Проспект Славы', '357'],
        ['Дунайская', '358'],
        ['Шушары', '359'],
        ['Горный институт', '382'],
    ],
    "Самарский": [
        ['Российская', '261'],
        ['Московская', '262'],
        ['Гагаринская', '263'],
        ['Спортивная', '264'],
        ['Советская', '265'],
        ['Победа', '266'],
        ['Безымянка', '267'],
        ['Кировская', '268'],
        ['Юнгородок', '269'],
        ['Победа', '270'],
        ['Алабинская', '312'],
    ],
    "Екатеринбургский": [
        ['Проспект Космонавтов', '340'],
        ['Уралмаш', '341'],
        ['Машиностроителей', '342'],
        ['Уральская', '343'],
        ['Динамо', '343'],
        ['Площадь 1905 года', '345'],
        ['Геологическая', '346'],
        ['Чкаловская', '347'],
        ['Ботаническая', '348'],
    ],
    "Новосибирский": [
        ['Заельцовская', '248'],
        ['Гагаринская', '249'],
        ['Красный Проспект', '250'],
        ['Сибирская', '251'],
        ['Площадь Ленина', '252'],
        ['Октябрьская', '253'],
        ['Речной Вокзал', '254'],
        ['Студенческая', '255'],
        ['Площадь Маркса', '256'],
        ['Площадь Гарина-Михайловского', '257'],
        ['Маршала Покрышкина', '258'],
        ['Березовая Роща', '259'],
        ['Золотая Нива', '260'],
    ],
    "Нижегородский": [
        ['Горьковская', '323'],
        ['Московская', '324'],
        ['Чкаловская', '325'],
        ['Ленинская', '326'],
        ['Заречная', '327'],
        ['Двигатель Революции', '328'],
        ['Пролетарская', '329'],
        ['Автозаводская', '330'],
        ['Комсомольская', '331'],
        ['Кировская', '332'],
        ['Парк культуры', '333'],
        ['Канавинская', '334'],
        ['Бурнаковская', '335'],
        ['Буревестник', '335'],
        ['Стрелка', '360']
    ],
}


================================================
FILE: cianparser/definers/__init__.py
================================================


================================================
FILE: cianparser/definers/definer_cities_id.py
================================================
import time
import requests
from bs4 import BeautifulSoup
import pymorphy2
import collections
import csv
import cloudscraper

ParseCityNames = collections.namedtuple(
    'ParseResults',
    {
        'location_name',
        'city_id',
    }
)


class Client:
    def __init__(self, start_location_id=1, end_location_id=20):
        self.session = cloudscraper.create_scraper()
        self.session.headers = {'Accept-Language': 'en'}

        self.cities = []
        self.cities_set = set()

        self.start_location_id = start_location_id
        self.end_location_id = end_location_id

    def define_city(self, html, location_id: int):
        soup = BeautifulSoup(html, 'html.parser')
        offers = soup.select("div[data-name='HeaderDefault']")

        if len(offers) == 0:
            print("_" + "  " + "***")
            return self.cities

        title = offers[0].text
        city = title.lower()[title.lower().find("снять квартиру в ") + len("снять квартиру в "):title.lower().find(
            " на длительный срок")]

        if ("в России" in title or "АрендаСнять" not in title or
                ("области" in city or "крае" in city or "республике" in city or
                 "округе" in city or "россии" in city or
                 "кабардино" in city or "карачаево" in city or
                 "дагестан" in city or "осетии" in city or
                 "ненецком ао" in city or "ямало-ненецком ао" in city or
                 "чукотском ао" in city or "ханты-мансийском ао" in city or
                 "чувашии" in city)
        ):
            print("_" + "  " + str(location_id))
            return self.cities

        morph = pymorphy2.MorphAnalyzer()
        city = morph.parse(city)[0].normal_form.title()
        print(city + " " + str(location_id))

        if city not in self.cities_set:
            self.cities_set.add(city)
            self.cities.append((city, location_id))
            self.save_results()

        return self.cities

    def define_all_cities(self):
        for location_id in range(self.start_location_id, self.end_location_id+1):
            path = f'https://www.cian.ru/cat.php?deal_type=rent&engine_version=2&offer_type=flat&p=1&region={location_id}&type=4'
            response = requests.get(path)
            html = response.text
            self.define_city(html, location_id)
            time.sleep(2)

        self.cities = sorted(self.cities, key=lambda x: x[0])

    def save_results(self):
        cities_result = []
        cities_result.append(ParseCityNames(
            location_name='location_name',
            city_id='city_id',
        ))

        for city_couple in self.cities:
            cities_result.append(ParseCityNames(
                location_name=city_couple[0],
                city_id=city_couple[1],
            ))

        path = f"cities_{self.start_location_id}_{self.end_location_id}.csv"
        with open(path, "w") as f:
            writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
            for item in self.cities:
                writer.writerow(item)


if __name__ == '__main__':
    definer = Client(start_location_id=6000, end_location_id=7000)
    definer.define_all_cities()


================================================
FILE: cianparser/definers/definer_metro_id.py
================================================
import time
import requests
from bs4 import BeautifulSoup
import collections
import csv
import cloudscraper

ParseMetroNames = collections.namedtuple(
    'ParseResults',
    {
        'city',
        'metro_name',
        'metro_id',
    }
)


class Client:
    def __init__(self, start_metro_id=1, end_metro_id=20):
        self.session = cloudscraper.create_scraper()
        self.session.headers = {'Accept-Language': 'en'}

        self.metro_stations = []
        self.metro_set = set()

        self.start_metro_id = start_metro_id
        self.end_metro_id = end_metro_id

    def define_metro(self, html, metro_id: int):
        soup = BeautifulSoup(html, 'html.parser')
        offers = soup.select("div[data-name='GeneralInfoSectionRowComponent']")

        if len(offers) == 0:
            print("_" + "  " + "***")
            return self.metro_stations

        address = offers[1].text

        if ", м." not in address:
            for offer in offers:
                if ", м." in offer.text:
                    address = offer.text

        if address.find(", м.") == 0:
            print("_" + "  " + "***" + "somethins wrong")

        city = "Unknown"
        if "Москва" in address:
            city = "Москва"
        if "Казань" in address:
            city = "Казань"
        if "Санкт-Петербург" in address:
            city = "Санкт-Петербург"
        if "Самара" in address:
            city = "Самара"
        if "Екатеринбург" in address:
            city = "Екатеринбург"
        if "Новосибирск" in address:
            city = "Новосибирск"
        if "Нижний Новгород" in address:
            city = "Нижний Новгород"

        metro = address[address.find(", м.") + len(", м. "):].split(", ")[0]
        print(f"{city}, {metro}, {str(metro_id)}")

        if metro not in self.metro_set:
            self.metro_set.add(metro)
            self.metro_stations.append((city, metro, metro_id))
            self.save_results()

        return self.metro_stations

    def define_all_metro_stations(self):
        for metro_id in range(self.start_metro_id, self.end_metro_id+1):
            path = f'https://www.cian.ru/cat.php?deal_type=rent&engine_version=2&offer_type=flat&p=1&region=1&type=4&metro[0]={metro_id}'
            response = requests.get(path)
            html = response.text
            self.define_metro(html, metro_id)
            time.sleep(2)

        self.metro_stations = sorted(self.metro_stations, key=lambda x: x[0])

    def save_results(self):
        metro_stations_result = [ParseMetroNames(
            city='city',
            metro_name='metro_name',
            metro_id='metro_id',
        )]

        for metro_couple in self.metro_stations:
            metro_stations_result.append(ParseMetroNames(
                city=metro_couple[0],
                metro_name=metro_couple[1],
                metro_id=metro_couple[2],
            ))

        path = f"metro_stations_{self.start_metro_id}_{self.end_metro_id}.csv"
        with open(path, "w") as f:
            writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
            for item in self.metro_stations:
                writer.writerow(item)


if __name__ == '__main__':
    definer = Client(start_metro_id=1, end_metro_id=10)
    definer.define_all_metro_stations()


================================================
FILE: cianparser/flat/list.py
================================================
import bs4
import time
import pathlib
from datetime import datetime
from transliterate import translit

from cianparser.constants import FILE_NAME_FLAT_FORMAT
from cianparser.helpers import union_dicts, define_author, define_location_data, define_specification_data, define_deal_url_id, define_price_data
from cianparser.flat.page import FlatPageParser
from cianparser.base_list import BaseListPageParser


class FlatListPageParser(BaseListPageParser):
    def build_file_path(self):
        now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
        file_name = FILE_NAME_FLAT_FORMAT.format(self.accommodation_type, self.deal_type, self.start_page, self.end_page, translit(self.location_name.lower(), reversed=True), now_time)
        return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))

    def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):
        list_soup = bs4.BeautifulSoup(html, 'html.parser')

        if list_soup.text.find("Captcha") > 0:
            print(f"\r{page_number} page: there is CAPTCHA... failed to parse page...")
            return False, attempt_number + 1, True

        header = list_soup.select("div[data-name='HeaderDefault']")
        if len(header) == 0:
            return False, attempt_number + 1, False

        offers = list_soup.select("article[data-name='CardComponent']")
        print("")
        print(f"\r {page_number} page: {len(offers)} offers", end="\r", flush=True)

        if page_number == self.start_page and attempt_number == 0:
            print(f"Collecting information from pages with list of offers", end="\n")

        for ind, offer in enumerate(offers):
            self.parse_offer(offer=offer)
            self.print_parse_progress(page_number=page_number, count_of_pages=count_of_pages, offers=offers, ind=ind)

        time.sleep(2)

        return True, 0, False

    def parse_offer(self, offer):
        common_data = dict()
        common_data["url"] = offer.select("div[data-name='LinkArea']")[0].select("a")[0].get('href')
        common_data["location"] = self.location_name
        common_data["deal_type"] = self.deal_type
        common_data["accommodation_type"] = self.accommodation_type

        author_data = define_author(block=offer)
        location_data = define_location_data(block=offer, is_sale=self.is_sale())
        price_data = define_price_data(block=offer)
        specification_data = define_specification_data(block=offer)

        if define_deal_url_id(common_data["url"]) in self.result_set:
            return

        page_data = dict()
        if self.with_extra_data:
            flat_parser = FlatPageParser(session=self.session, url=common_data["url"])
            page_data = flat_parser.parse_page()
            time.sleep(4)

        self.count_parsed_offers += 1
        self.define_average_price(price_data=price_data)
        self.result_set.add(define_deal_url_id(common_data["url"]))
        self.result.append(union_dicts(author_data, common_data, specification_data, price_data, page_data, location_data))

        if self.with_saving_csv:
            self.save_results()


================================================
FILE: cianparser/flat/page.py
================================================
import bs4
import re
import time


class FlatPageParser:
    def __init__(self, session, url):
        self.session = session
        self.url = url

    def __load_page__(self):
        res = self.session.get(self.url)
        if res.status_code == 429:
            time.sleep(10)
        res.raise_for_status()
        self.offer_page_html = res.text
        self.offer_page_soup = bs4.BeautifulSoup(self.offer_page_html, 'html.parser')

    def __parse_flat_offer_page_json__(self):
        page_data = {
            "year_of_construction": -1,
            "object_type": -1,
            "house_material_type": -1,
            "heating_type": -1,
            "finish_type": -1,
            "living_meters": -1,
            "kitchen_meters": -1,
            "floor": -1,
            "floors_count": -1,
            "phone": "",
        }

        spans = self.offer_page_soup.select("span")
        for index, span in enumerate(spans):
            if "Тип жилья" == span.text:
                page_data["object_type"] = spans[index + 1].text

            if "Тип дома" == span.text:
                page_data["house_material_type"] = spans[index + 1].text

            if "Отопление" == span.text:
                page_data["heating_type"] = spans[index + 1].text

            if "Отделка" == span.text:
                page_data["finish_type"] = spans[index + 1].text

            if "Площадь кухни" == span.text:
                page_data["kitchen_meters"] = spans[index + 1].text

            if "Жилая площадь" == span.text:
                page_data["living_meters"] = spans[index + 1].text

            if "Год постройки" in span.text:
                page_data["year_of_construction"] = spans[index + 1].text

            if "Год сдачи" in span.text:
                page_data["year_of_construction"] = spans[index + 1].text

            if "Этаж" == span.text:
                ints = re.findall(r'\d+', spans[index + 1].text)
                if len(ints) == 2:
                    page_data["floor"] = int(ints[0])
                    page_data["floors_count"] = int(ints[1])

        if "+7" in self.offer_page_html:
            page_data["phone"] = self.offer_page_html[self.offer_page_html.find("+7"): self.offer_page_html.find("+7") + 16].split('"')[0]. \
                replace(" ", ""). \
                replace("-", "")

        return page_data

    def parse_page(self):
        self.__load_page__()
        return self.__parse_flat_offer_page_json__()


================================================
FILE: cianparser/helpers.py
================================================
import re
import itertools
from cianparser.constants import STREET_TYPES, NOT_STREET_ADDRESS_ELEMENTS, FLOATS_NUMBERS_REG_EXPRESSION


def union_dicts(*dicts):
    return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))


def define_rooms_count(description):
    if "1-комн" in description or "Студия" in description:
        rooms_count = 1
    elif "2-комн" in description:
        rooms_count = 2
    elif "3-комн" in description:
        rooms_count = 3
    elif "4-комн" in description:
        rooms_count = 4
    elif "5-комн" in description:
        rooms_count = 5
    else:
        rooms_count = -1

    return rooms_count


def define_deal_url_id(url: str):
    url_path_elements = url.split("/")
    if len(url_path_elements[-1]) > 3:
        return url_path_elements[-1]
    if len(url_path_elements[-2]) > 3:
        return url_path_elements[-2]

    return "-1"


def define_author(block):
    spans = block.select("div")[0].select("span")

    author_data = {
        "author": "",
        "author_type": "",
    }

    for index, span in enumerate(spans):
        if "Агентство недвижимости" in span:
            author_data["author"] = spans[index + 1].text.replace(",", ".").strip()
            author_data["author_type"] = "real_estate_agent"
            return author_data

    for index, span in enumerate(spans):
        if "Собственник" in span:
            author_data["author"] = spans[index + 1].text
            author_data["author_type"] = "homeowner"
            return author_data

    for index, span in enumerate(spans):
        if "Риелтор" in span:
            author_data["author"] = spans[index + 1].text
            author_data["author_type"] = "realtor"
            return author_data

    for index, span in enumerate(spans):
        if "Ук・оф.Представитель" in span:
            author_data["author"] = spans[index + 1].text
            author_data["author_type"] = "official_representative"
            return author_data

    for index, span in enumerate(spans):
        if "Представитель застройщика" in span:
            author_data["author"] = spans[index + 1].text
            author_data["author_type"] = "representative_developer"
            return author_data

    for index, span in enumerate(spans):
        if "Застройщик" in span:
            author_data["author"] = spans[index + 1].text
            author_data["author_type"] = "developer"
            return author_data

    for index, span in enumerate(spans):
        if "ID" in span.text:
            author_data["author"] = span.text
            author_data["author_type"] = "unknown"
            return author_data

    return author_data


def parse_location_data(block):
    general_info_sections = block.select_one("div[data-name='LinkArea']").select("div[data-name='GeneralInfoSectionRowComponent']")

    location_data = dict()
    location_data["district"] = ""
    location_data["underground"] = ""
    location_data["street"] = ""
    location_data["house_number"] = ""

    for section in general_info_sections:
        geo_labels = section.select("a[data-name='GeoLabel']")

        # if len(geo_labels) > 1:
            # print("\n\n", location_data["street"] == "",geo_labels[-2].text, "|||", geo_labels[-1].text)

        for index, label in enumerate(geo_labels):
            if "м. " in label.text:
                location_data["underground"] = label.text

            if "р-н" in label.text or "поселение" in label.text:
                location_data["district"] = label.text

            if any(street_type in label.text.lower() for street_type in STREET_TYPES):
                location_data["street"] = label.text

                if len(geo_labels) > index + 1 and any(chr.isdigit() for chr in geo_labels[index + 1].text):
                    location_data["house_number"] = geo_labels[index + 1].text

    return location_data


def define_location_data(block, is_sale):
    elements = block.select_one("div[data-name='LinkArea']").select("div[data-name='GeneralInfoSectionRowComponent']")

    location_data = dict()
    location_data["district"] = ""
    location_data["street"] = ""
    location_data["house_number"] = ""
    location_data["underground"] = ""

    if is_sale:
        location_data["residential_complex"] = ""

    for index, element in enumerate(elements):
        if ("ЖК" in element.text) and ("«" in element.text) and ("»" in element.text):
            location_data["residential_complex"] = element.text.split("«")[1].split("»")[0]

        if "р-н" in element.text and len(element.text) < 250:
            address_elements = element.text.split(",")
            if len(address_elements) < 2:
                continue

            if "ЖК" in address_elements[0] and "«" in address_elements[0] and "»" in address_elements[0]:
                location_data["residential_complex"] = address_elements[0].split("«")[1].split("»")[0]

            if ", м. " in element.text:
                location_data["underground"] = element.text.split(", м. ")[1]
                if "," in location_data["underground"]:
                    location_data["underground"] = location_data["underground"].split(",")[0]

            if (any(chr.isdigit() for chr in address_elements[-1]) and "жк" not in address_elements[-1].lower() and
                not any(street_type in address_elements[-1].lower() for street_type in STREET_TYPES)) and len(
                address_elements[-1]) < 10:
                location_data["house_number"] = address_elements[-1].strip()

            for ind, elem in enumerate(address_elements):
                if "р-н" in elem:
                    district = elem.replace("р-н", "").strip()

                    location_data["district"] = district

                    if "ЖК" in address_elements[-1]:
                        location_data["residential_complex"] = address_elements[-1].strip()

                    if "ЖК" in address_elements[-2]:
                        location_data["residential_complex"] = address_elements[-2].strip()

                    for street_type in STREET_TYPES:
                        if street_type in address_elements[-1]:
                            location_data["street"] = address_elements[-1].strip()
                            if street_type == "улица":
                                location_data["street"] = location_data["street"].replace("улица", "")
                            return location_data

                        if street_type in address_elements[-2]:
                            location_data["street"] = address_elements[-2].strip()
                            if street_type == "улица":
                                location_data["street"] = location_data["street"].replace("улица", "")

                            return location_data

                    for k, after_district_address_element in enumerate(address_elements[ind + 1:]):
                        if len(list(set(after_district_address_element.split(" ")).intersection(
                                NOT_STREET_ADDRESS_ELEMENTS))) != 0:
                            continue

                        if len(after_district_address_element.strip().replace(" ", "")) < 4:
                            continue

                        location_data["street"] = after_district_address_element.strip()

                        return location_data

            return location_data

    if location_data["district"] == "":
        for index, element in enumerate(elements):
            if ", м. " in element.text and len(element.text) < 250:
                location_data["underground"] = element.text.split(", м. ")[1]
                if "," in location_data["underground"]:
                    location_data["underground"] = location_data["underground"].split(",")[0]

                address_elements = element.text.split(",")

                if len(address_elements) < 2:
                    continue

                if "ЖК" in address_elements[-1]:
                    location_data["residential_complex"] = address_elements[-1].strip()

                if "ЖК" in address_elements[-2]:
                    location_data["residential_complex"] = address_elements[-2].strip()

                if (any(chr.isdigit() for chr in address_elements[-1]) and "жк" not in address_elements[
                    -1].lower() and
                    not any(
                        street_type in address_elements[-1].lower() for street_type in STREET_TYPES)) and len(
                    address_elements[-1]) < 10:
                    location_data["house_number"] = address_elements[-1].strip()

                for street_type in STREET_TYPES:
                    if street_type in address_elements[-1]:
                        location_data["street"] = address_elements[-1].strip()
                        if street_type == "улица":
                            location_data["street"] = location_data["street"].replace("улица", "")
                        return location_data

                    if street_type in address_elements[-2]:
                        location_data["street"] = address_elements[-2].strip()
                        if street_type == "улица":
                            location_data["street"] = location_data["street"].replace("улица", "")
                        return location_data

            for street_type in STREET_TYPES:
                if (", " + street_type + " " in element.text) or (" " + street_type + ", " in element.text):
                    address_elements = element.text.split(",")

                    if len(address_elements) < 3:
                        continue

                    if (any(chr.isdigit() for chr in address_elements[-1]) and "жк" not in address_elements[
                        -1].lower() and
                        not any(
                            street_type in address_elements[-1].lower() for street_type in STREET_TYPES)) and len(
                        address_elements[-1]) < 10:
                        location_data["house_number"] = address_elements[-1].strip()

                    if street_type in address_elements[-1]:
                        location_data["street"] = address_elements[-1].strip()
                        if street_type == "улица":
                            location_data["street"] = location_data["street"].replace("улица", "")

                        location_data["district"] = address_elements[-2].strip()

                        return location_data

                    if street_type in address_elements[-2]:
                        location_data["street"] = address_elements[-2].strip()
                        if street_type == "улица":
                            location_data["street"] = location_data["street"].replace("улица", "")

                        location_data["district"] = address_elements[-3].strip()

                        return location_data

    return location_data


def define_price_data(block):
    elements = block.select("div[data-name='LinkArea']")[0]. \
        select("span[data-mark='MainPrice']")

    price_data = {
        "price_per_month": -1,
        "commissions": 0,
    }

    for element in elements:
        if "₽/мес" in element.text:
            price_description = element.text
            price_data["price_per_month"] = int(
                "".join(price_description[:price_description.find("₽/мес") - 1].split()))

            if "%" in price_description:
                price_data["commissions"] = int(
                    price_description[price_description.find("%") - 2:price_description.find("%")].replace(" ", ""))

            return price_data

        if "₽" in element.text and "млн" not in element.text:
            price_description = element.text
            price_data["price"] = int("".join(price_description[:price_description.find("₽") - 1].split()))

            return price_data

    return price_data


def define_specification_data(block):
    specification_data = dict()
    specification_data["floor"] = -1
    specification_data["floors_count"] = -1
    specification_data["rooms_count"] = -1
    specification_data["total_meters"] = -1

    title = block.select("div[data-name='LinkArea']")[0].select("div[data-name='GeneralInfoSectionRowComponent']")[
        0].text

    common_properties = block.select("div[data-name='LinkArea']")[0]. \
        select("div[data-name='GeneralInfoSectionRowComponent']")[0].text

    if common_properties.find("м²") is not None:
        total_meters = title[: common_properties.find("м²")].replace(",", ".")
        if len(re.findall(FLOATS_NUMBERS_REG_EXPRESSION, total_meters)) != 0:
            specification_data["total_meters"] = float(
                re.findall(FLOATS_NUMBERS_REG_EXPRESSION, total_meters)[-1].replace(" ", "").replace("-", ""))

    if "этаж" in common_properties:
        floor_per = common_properties[common_properties.rfind("этаж") - 7: common_properties.rfind("этаж")]
        floor_properties = floor_per.split("/")

        if len(floor_properties) == 2:
            ints = re.findall(r'\d+', floor_properties[0])
            if len(ints) != 0:
                specification_data["floor"] = int(ints[-1])

            ints = re.findall(r'\d+', floor_properties[1])
            if len(ints) != 0:
                specification_data["floors_count"] = int(ints[-1])

    specification_data["rooms_count"] = define_rooms_count(common_properties)

    return specification_data


================================================
FILE: cianparser/newobject/list.py
================================================
import bs4
import time
import math
import csv
import pathlib
from datetime import datetime
from transliterate import translit
import urllib.parse

from cianparser.constants import FILE_NAME_NEWOBJECT_FORMAT
from cianparser.helpers import union_dicts
from cianparser.newobject.page import NewObjectPageParser


class NewObjectListParser:
    def __init__(self, session, location_name: str, with_saving_csv=False):
        self.accommodation_type = "newobject"
        self.deal_type = "sale"
        self.session = session
        self.location_name = location_name
        self.with_saving_csv = with_saving_csv

        self.result = []
        self.result_set = set()
        self.average_price = 0
        self.count_parsed_offers = 0
        self.start_page = 1
        self.end_page = 50
        self.file_path = self.build_file_path()

    def build_file_path(self):
        now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
        file_name = FILE_NAME_NEWOBJECT_FORMAT.format(self.accommodation_type, translit(self.location_name.lower(), reversed=True), now_time)
        return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))

    def print_parse_progress(self, page_number, count_of_pages, offers, ind):
        total_planed_offers = len(offers) * count_of_pages
        print(f"\r {page_number - self.start_page + 1}"
              f" | {page_number} page with list: [" + "=>" * (ind + 1) + "  " * (len(offers) - ind - 1) + "]" + f" {math.ceil((ind + 1) * 100 / len(offers))}" + "%" +
              f" | Count of all parsed: {self.count_parsed_offers}."
              f" Progress ratio: {math.ceil(self.count_parsed_offers * 100 / total_planed_offers)} %.",
              end="\r", flush=True)

    def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):
        list_soup = bs4.BeautifulSoup(html, 'html.parser')

        if list_soup.text.find("Captcha") > 0:
            print(f"\r{page_number} page: there is CAPTCHA... failed to parse page...")
            return False, attempt_number + 1, True

        offers = list_soup.select("div[data-mark='GKCard']")
        print("")
        print(f"\r {page_number} page: {len(offers)} offers", end="\r", flush=True)

        if page_number == self.start_page and attempt_number == 0:
            print(f"Collecting information from pages with list of offers", end="\n")

        for ind, offer in enumerate(offers):
            self.parse_offer(offer=offer)
            self.print_parse_progress(page_number=page_number, count_of_pages=count_of_pages, offers=offers, ind=ind)

        time.sleep(2)

        return True, 0, False

    def parse_offer(self, offer):
        common_data = dict()
        common_data["name"] = offer.select_one("span[data-mark='Text']").text
        common_data["location"] = self.location_name
        common_data["accommodation_type"] = self.accommodation_type
        common_data["url"] = "https://" + urllib.parse.urlparse(offer.select_one("a[data-mark='Link']").get('href')).netloc
        common_data["full_full_location_address"] = offer.select_one("div[data-mark='CellAddressBlock']").text

        if common_data["url"] in self.result_set:
            return

        flat_parser = NewObjectPageParser(session=self.session, url=common_data["url"])
        page_data = flat_parser.parse_page()
        time.sleep(4)

        self.count_parsed_offers += 1
        self.result_set.add(common_data["url"])
        self.result.append(union_dicts(common_data, page_data))

        if self.with_saving_csv:
            self.save_results()

    def save_results(self):
        keys = self.result[0].keys()

        with open(self.file_path, 'w', newline='', encoding='utf-8') as output_file:
            dict_writer = csv.DictWriter(output_file, keys, delimiter=';')
            dict_writer.writeheader()
            dict_writer.writerows(self.result)


================================================
FILE: cianparser/newobject/page.py
================================================
import bs4
import re
import time


class NewObjectPageParser:
    def __init__(self, session, url):
        self.session = session
        self.url = url

    def __load_page__(self):
        res = self.session.get(self.url)
        if res.status_code == 429:
            time.sleep(10)
        res.raise_for_status()
        self.offer_page_html = res.text
        self.offer_page_soup = bs4.BeautifulSoup(self.offer_page_html, 'html.parser')

    def parse_page(self):
        self.__load_page__()

        page_data = {
            "year_of_construction": -1,
            "house_material_type": -1,
            "finish_type": -1,
            "ceiling_height":-1,
            "class": -1,
            "parking_type": -1,
            "floors_from": -1,
            "floors_to": -1,
        }

        spans = self.offer_page_soup.select("span")
        for index, span in enumerate(spans):
            if "Срок сдачи" in span.text:
                page_data["year_of_construction"] = spans[index + 1].text

            if "Тип дома" == span.text:
                page_data["house_material_type"] = spans[index + 1].text

            if "Отделка" == span.text:
                page_data["finish_type"] = spans[index + 1].text

            if "Высота потолков" == span.text:
                page_data["ceiling_height"] = spans[index + 1].text

            if "Класс" == span.text:
                page_data["class"] = spans[index + 1].text

            if "Застройщик" in span.text and "Проектная декларация" in span.text:
                page_data["builder"] = span.text.split(".")[0]

            if "Парковка" == span.text:
                page_data["parking_type"] = spans[index + 1].text

            if "Этажность" == span.text:
                ints = re.findall(r'\d+', spans[index + 1].text)
                if len(ints) == 2:
                    page_data["floors_from"] = int(ints[0])
                    page_data["floors_to"] = int(ints[1])
                if len(ints) == 1:
                    page_data["floors_from"] = int(ints[0])
                    page_data["floors_to"] = int(ints[0])

        return page_data




================================================
FILE: cianparser/proxy_pool.py
================================================
import time
import urllib.request
import urllib.error
import bs4
import random
import socket


class ProxyPool:
    def __init__(self, proxies):
        self.__proxy_pool__ = [] if proxies is None else proxies
        self.__current_proxy__ = None
        self.__page_html__ = None

    def __is_captcha__(self):
        page_soup = bs4.BeautifulSoup(self.__page_html__, 'html.parser')
        return page_soup.text.find("Captcha") > 0

    def __is_available_proxy__(self, url, proxy):
        opener = urllib.request.build_opener(urllib.request.ProxyHandler({'https': proxy}))
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib.request.install_opener(opener)

        try:
            self.__page_html__ = urllib.request.urlopen(urllib.request.Request(url))
        except Exception as detail:
            print(f"atas: {detail}..")
            return False

        return True

    def is_empty(self):
        return len(self.__proxy_pool__) == 0

    def get_available_proxy(self, url):
        print("The process of checking the proxies... Search an available one among them...")

        socket.setdefaulttimeout(5)
        found_proxy = False
        while len(self.__proxy_pool__) > 0 and found_proxy is False:
            proxy = random.choice(self.__proxy_pool__)

            is_available = self.__is_available_proxy__(url, proxy)
            is_captcha = self.__is_captcha__() if is_available else None

            if not is_available or is_captcha:
                if is_captcha:
                    print(f"proxy {proxy}: there is captcha.. trying another")
                else:
                    print(f"proxy {proxy}: unavailable.. trying another..")
                self.__proxy_pool__.remove(proxy)
                time.sleep(4)
                continue

            print(f"proxy {proxy}: available.. stop searching")
            self.__current_proxy__, found_proxy = proxy, True

        if self.__current_proxy__ is None:
            print(f"there are not available proxies..", end="\n\n")

        return self.__current_proxy__


================================================
FILE: cianparser/suburban/list.py
================================================
import bs4
import time
import pathlib
from datetime import datetime
from transliterate import translit

from cianparser.constants import FILE_NAME_SUBURBAN_FORMAT
from cianparser.helpers import union_dicts, define_author, parse_location_data, define_price_data, define_deal_url_id
from cianparser.suburban.page import SuburbanPageParser
from cianparser.base_list import BaseListPageParser


class SuburbanListPageParser(BaseListPageParser):
    def build_file_path(self):
        now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
        file_name = FILE_NAME_SUBURBAN_FORMAT.format(self.accommodation_type, self.object_type, self.deal_type, self.start_page, self.end_page, translit(self.location_name.lower(), reversed=True), now_time)
        return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))

    def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):
        list_soup = bs4.BeautifulSoup(html, 'html.parser')

        if list_soup.text.find("Captcha") > 0:
            print(f"\r{page_number} page: there is CAPTCHA... failed to parse page...")
            return False, attempt_number + 1, True

        header = list_soup.select("div[data-name='HeaderDefault']")
        if len(header) == 0:
            return False, attempt_number + 1, False

        offers = list_soup.select("article[data-name='CardComponent']")
        print("")
        print(f"\r {page_number} page: {len(offers)} offers", end="\r", flush=True)

        if page_number == self.start_page and attempt_number == 0:
            print(f"Collecting information from pages with list of offers", end="\n")

        for ind, offer in enumerate(offers):
            self.parse_offer(offer=offer)
            self.print_parse_progress(page_number=page_number, count_of_pages=count_of_pages, offers=offers, ind=ind)

        time.sleep(2)

        return True, 0, False

    def parse_offer(self, offer):
        common_data = dict()
        common_data["url"] = offer.select("div[data-name='LinkArea']")[0].select("a")[0].get('href')
        common_data["location"] = self.location_name
        common_data["deal_type"] = self.deal_type
        common_data["accommodation_type"] = self.accommodation_type
        common_data["suburban_type"] = self.object_type

        author_data = define_author(block=offer)
        location_data = parse_location_data(block=offer)
        price_data = define_price_data(block=offer)

        if define_deal_url_id(common_data["url"]) in self.result_set:
            return

        page_data = dict()
        if self.with_extra_data:
            suburban_parser = SuburbanPageParser(session=self.session, url=common_data["url"])
            page_data = suburban_parser.parse_page()
            time.sleep(4)

        self.count_parsed_offers += 1
        self.define_average_price(price_data=price_data)
        self.result_set.add(define_deal_url_id(common_data["url"]))
        self.result.append(union_dicts(author_data, common_data, price_data, page_data, location_data))

        if self.with_saving_csv:
            self.save_results()




================================================
FILE: cianparser/suburban/page.py
================================================
import time

import bs4


class SuburbanPageParser:
    def __init__(self, session, url):
        self.session = session
        self.url = url

    def __load_page__(self):
        res = self.session.get(self.url)
        if res.status_code == 429:
            time.sleep(10)
        res.raise_for_status()
        self.offer_page_html = res.text
        self.offer_page_soup = bs4.BeautifulSoup(self.offer_page_html, 'html.parser')

    def parse_page(self):
        self.__load_page__()

        page_data = {
            "year_of_construction": -1,
            "house_material_type": -1,
            "land_plot":-1,
            "land_plot_status": -1,
            "heating_type": -1,
            "gas_type":-1,
            "water_supply_type":-1,
            "sewage_system":-1,
            "bathroom":-1,
            "living_meters": -1,
            "floors_count": -1,
            "phone": "",
        }

        spans = self.offer_page_soup.select("span")
        for index, span in enumerate(spans):
            if "Материал дома" == span.text:
                page_data["house_material_type"] = spans[index + 1].text

            if "Участок" == span.text:
                page_data["land_plot"] = spans[index + 1].text

            if "Статус участка" == span.text:
                page_data["land_plot_status"] = spans[index + 1].text

            if "Отопление" == span.text:
                page_data["heating_type"] = spans[index + 1].text

            if "Газ" == span.text:
                page_data["gas_type"] = spans[index + 1].text

            if "Водоснабжение" == span.text:
                page_data["water_supply_type"] = spans[index + 1].text

            if "Канализация" == span.text:
                page_data["sewage_system"] = spans[index + 1].text

            if "Санузел" == span.text:
                page_data["bathroom"] = spans[index + 1].text

            if "Площадь кухни" == span.text:
                page_data["kitchen_meters"] = spans[index + 1].text

            if "Общая площадь" == span.text:
                page_data["living_meters"] = spans[index + 1].text

            if "Год постройки" in span.text:
                page_data["year_of_construction"] = spans[index + 1].text

            if "Год сдачи" in span.text:
                page_data["year_of_construction"] = spans[index + 1].text

            if "Этажей в доме" == span.text:
                page_data["floors_count"] = spans[index + 1].text

        if "+7" in self.offer_page_html:
            page_data["phone"] = self.offer_page_html[self.offer_page_html.find("+7"): self.offer_page_html.find("+7") + 16].split('"')[0]. \
                replace(" ", ""). \
                replace("-", "")

        return page_data


================================================
FILE: cianparser/url_builder.py
================================================
from cianparser.constants import *


class URLBuilder:
    def __init__(self, is_newobject):
        self.url = BASE_URL
        self.add_newobject_postfix() if is_newobject else self.add_default_postfix()
        self.url += DEFAULT_PATH

    def add_default_postfix(self):
        self.url += DEFAULT_POSTFIX_PATH

    def add_newobject_postfix(self):
        self.url += NEWOBJECT_POSTFIX_PATH

    def get_url(self):
        return self.url

    def add_accommodation_type(self, accommodation_type):
        self.url += OFFER_TYPE_PATH.format(accommodation_type)
        
    def add_deal_type(self, deal_type):
        self.url += DEAL_TYPE_PATH.format(deal_type)

    def add_location(self, location_id):
        self.url += REGION_PATH.format(location_id)

    def add_room(self, rooms):
        rooms_path = ""
        if type(rooms) is tuple:
            for count_of_room in rooms:
                if type(count_of_room) is int:
                    if 0 < count_of_room < 6:
                        rooms_path += ROOM_PATH.format(count_of_room)
                elif type(count_of_room) is str:
                    if count_of_room == "studio":
                        rooms_path += STUDIO_PATH
        elif type(rooms) is int:
            if 0 < rooms < 6:
                rooms_path += ROOM_PATH.format(rooms)
        elif type(rooms) is str:
            if rooms == "studio":
                rooms_path += STUDIO_PATH
            elif rooms == "all":
                rooms_path = ""

        self.url += rooms_path

    def add_rent_period_type(self, rent_period_type):
        self.url += RENT_PERIOD_TYPE_PATH.format(rent_period_type)

    def add_object_suburban_type(self, object_type):
        self.url += OBJECT_TYPE_PATH.format(OBJECT_SUBURBAN_TYPES[object_type])

    def add_additional_settings(self, additional_settings):
        if "object_type" in additional_settings.keys():
            self.url += OBJECT_TYPE_PATH.format(OBJECT_TYPES[additional_settings["object_type"]])

        if "is_by_homeowner" in additional_settings.keys() and additional_settings["is_by_homeowner"]:
            self.url += IS_ONLY_HOMEOWNER_PATH
        if "min_balconies" in additional_settings.keys():
            self.url += MIN_BALCONIES_PATH.format(additional_settings["min_balconies"])
        if "have_loggia" in additional_settings.keys() and additional_settings["have_loggia"]:
            self.url += HAVE_LOGGIA_PATH

        if "min_house_year" in additional_settings.keys():
            self.url += MIN_HOUSE_YEAR_PATH.format(additional_settings["min_house_year"])
        if "max_house_year" in additional_settings.keys():
            self.url += MAX_HOUSE_YEAR_PATH.format(additional_settings["max_house_year"])

        if "min_price" in additional_settings.keys():
            self.url += MIN_PRICE_PATH.format(additional_settings["min_price"])
        if "max_price" in additional_settings.keys():
            self.url += MAX_PRICE_PATH.format(additional_settings["max_price"])

        if "min_floor" in additional_settings.keys():
            self.url += MIN_FLOOR_PATH.format(additional_settings["min_floor"])
        if "max_floor" in additional_settings.keys():
            self.url += MAX_FLOOR_PATH.format(additional_settings["max_floor"])

        if "min_total_floor" in additional_settings.keys():
            self.url += MIN_TOTAL_FLOOR_PATH.format(additional_settings["min_total_floor"])
        if "max_total_floor" in additional_settings.keys():
            self.url += MAX_TOTAL_FLOOR_PATH.format(additional_settings["max_total_floor"])

        if "house_material_type" in additional_settings.keys():
            self.url += HOUSE_MATERIAL_TYPE_PATH.format(additional_settings["house_material_type"])

        if "metro" in additional_settings.keys():
            if "metro_station" in additional_settings.keys():
                if additional_settings["metro"] in METRO_STATIONS.keys():
                    for metro_station, metro_id in METRO_STATIONS[additional_settings["metro"]]:
                        if additional_settings["metro_station"] == metro_station:
                            self.url += METRO_ID_PATH.format(metro_id)

        if "metro_foot_minute" in additional_settings.keys():
            self.url += METRO_FOOT_MINUTE_PATH.format(additional_settings["metro_foot_minute"])

        if "flat_share" in additional_settings.keys():
            self.url += FLAT_SHARE_PATH.format(additional_settings["flat_share"])

        if "only_flat" in additional_settings.keys():
            if additional_settings["only_flat"]:
                self.url += ONLY_FLAT_PATH.format(1)

        if "only_apartment" in additional_settings.keys():
            if additional_settings["only_apartment"]:
                self.url += APARTMENT_PATH.format(1)

        if "sort_by" in additional_settings.keys():
            if additional_settings["sort_by"] == IS_SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH:
                self.url += SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH
            if additional_settings["sort_by"] == IS_SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH:
                self.url += SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH
            if additional_settings["sort_by"] == IS_SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH:
                self.url += SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH
            if additional_settings["sort_by"] == IS_SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH:
                self.url += SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH
            if additional_settings["sort_by"] == IS_SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH:
                self.url += SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH


================================================
FILE: setup.cfg
================================================
[metadata]
name = cianparser
version = 1.0.4
description = Parser information from Cian website
url = https://github.com/lenarsaitov/cianparser
author = Lenar Saitov
author_email = lenarsaitov1@yandex.ru
long_description = file: README.md
license_file = MIT
keywords = python parser requests cloudscraper beautifulsoup cian realstate

================================================
FILE: setup.py
================================================
from setuptools import setup

with open("README.md", encoding="utf8") as file:
    read_me_description = file.read()


setup(
    name='cianparser',
    version='1.0.4',
    description='Parser information from Cian website',
    url='https://github.com/lenarsaitov/cianparser',
    author='Lenar Saitov',
    author_email='lenarsaitov1@yandex.ru',
    license='MIT',
    packages=['cianparser', 'cianparser.flat', 'cianparser.newobject', 'cianparser.suburban'],
    long_description=read_me_description,
    long_description_content_type="text/markdown",
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    keywords='python parser requests cloudscraper beautifulsoup cian realstate',
    install_requires=['cloudscraper', 'beautifulsoup4', 'transliterate', 'lxml', 'datetime'],
)

Download .txt

gitextract_5z1o6h1o/

├── .github/
│   └── FUNDING.yml
├── .gitignore
├── LICENSE
├── README.md
├── cianparser/
│   ├── __init__.py
│   ├── base_list.py
│   ├── cianparser.py
│   ├── constants.py
│   ├── definers/
│   │   ├── __init__.py
│   │   ├── definer_cities_id.py
│   │   └── definer_metro_id.py
│   ├── flat/
│   │   ├── list.py
│   │   └── page.py
│   ├── helpers.py
│   ├── newobject/
│   │   ├── list.py
│   │   └── page.py
│   ├── proxy_pool.py
│   ├── suburban/
│   │   ├── list.py
│   │   └── page.py
│   └── url_builder.py
├── setup.cfg
└── setup.py

Download .txt

SYMBOL INDEX (89 symbols across 13 files)

FILE: cianparser/base_list.py
  class BaseListPageParser (line 7) | class BaseListPageParser:
    method __init__ (line 8) | def __init__(self,
    method is_sale (line 31) | def is_sale(self):
    method is_rent_long (line 34) | def is_rent_long(self):
    method is_rent_short (line 37) | def is_rent_short(self):
    method build_file_path (line 40) | def build_file_path(self):
    method define_average_price (line 43) | def define_average_price(self, price_data):
    method print_parse_progress (line 49) | def print_parse_progress(self, page_number, count_of_pages, offers, ind):
    method remove_unnecessary_fields (line 58) | def remove_unnecessary_fields(self):
    method save_results (line 88) | def save_results(self):

FILE: cianparser/cianparser.py
  function list_locations (line 12) | def list_locations():
  function list_metro_stations (line 16) | def list_metro_stations():
  class CianParser (line 20) | class CianParser:
    method __init__ (line 21) | def __init__(self, location: str, proxies=None):
    method __set_proxy__ (line 39) | def __set_proxy__(self, url_list):
    method __load_list_page__ (line 46) | def __load_list_page__(self, url_list_format, page_number, attempt_num...
    method __run__ (line 60) | def __run__(self, url_list_format: str):
    method get_flats (line 92) | def get_flats(self, deal_type: str, rooms, with_saving_csv=False, with...
    method get_suburban (line 125) | def get_suburban(self, suburban_type: str, deal_type: str, with_saving...
    method get_newobjects (line 159) | def get_newobjects(self, with_saving_csv=False):
  function __validation_init__ (line 178) | def __validation_init__(location):
  function __validation_get_flats__ (line 191) | def __validation_get_flats__(deal_type, rooms):
  function __validation_get_suburban__ (line 222) | def __validation_get_suburban__(suburban_type, deal_type):
  function __build_url_list__ (line 232) | def __build_url_list__(location_id, deal_type, accommodation_type, rooms...
  function __define_deal_type__ (line 254) | def __define_deal_type__(deal_type):

FILE: cianparser/definers/definer_cities_id.py
  class Client (line 18) | class Client:
    method __init__ (line 19) | def __init__(self, start_location_id=1, end_location_id=20):
    method define_city (line 29) | def define_city(self, html, location_id: int):
    method define_all_cities (line 64) | def define_all_cities(self):
    method save_results (line 74) | def save_results(self):

FILE: cianparser/definers/definer_metro_id.py
  class Client (line 18) | class Client:
    method __init__ (line 19) | def __init__(self, start_metro_id=1, end_metro_id=20):
    method define_metro (line 29) | def define_metro(self, html, metro_id: int):
    method define_all_metro_stations (line 73) | def define_all_metro_stations(self):
    method save_results (line 83) | def save_results(self):

FILE: cianparser/flat/list.py
  class FlatListPageParser (line 13) | class FlatListPageParser(BaseListPageParser):
    method build_file_path (line 14) | def build_file_path(self):
    method parse_list_offers_page (line 19) | def parse_list_offers_page(self, html, page_number: int, count_of_page...
    method parse_offer (line 45) | def parse_offer(self, offer):

FILE: cianparser/flat/page.py
  class FlatPageParser (line 6) | class FlatPageParser:
    method __init__ (line 7) | def __init__(self, session, url):
    method __load_page__ (line 11) | def __load_page__(self):
    method __parse_flat_offer_page_json__ (line 19) | def __parse_flat_offer_page_json__(self):
    method parse_page (line 72) | def parse_page(self):

FILE: cianparser/helpers.py
  function union_dicts (line 6) | def union_dicts(*dicts):
  function define_rooms_count (line 10) | def define_rooms_count(description):
  function define_deal_url_id (line 27) | def define_deal_url_id(url: str):
  function define_author (line 37) | def define_author(block):
  function parse_location_data (line 90) | def parse_location_data(block):
  function define_location_data (line 121) | def define_location_data(block, is_sale):
  function define_price_data (line 268) | def define_price_data(block):
  function define_specification_data (line 298) | def define_specification_data(block):

FILE: cianparser/newobject/list.py
  class NewObjectListParser (line 15) | class NewObjectListParser:
    method __init__ (line 16) | def __init__(self, session, location_name: str, with_saving_csv=False):
    method build_file_path (line 31) | def build_file_path(self):
    method print_parse_progress (line 36) | def print_parse_progress(self, page_number, count_of_pages, offers, ind):
    method parse_list_offers_page (line 44) | def parse_list_offers_page(self, html, page_number: int, count_of_page...
    method parse_offer (line 66) | def parse_offer(self, offer):
    method save_results (line 88) | def save_results(self):

FILE: cianparser/newobject/page.py
  class NewObjectPageParser (line 6) | class NewObjectPageParser:
    method __init__ (line 7) | def __init__(self, session, url):
    method __load_page__ (line 11) | def __load_page__(self):
    method parse_page (line 19) | def parse_page(self):

FILE: cianparser/proxy_pool.py
  class ProxyPool (line 9) | class ProxyPool:
    method __init__ (line 10) | def __init__(self, proxies):
    method __is_captcha__ (line 15) | def __is_captcha__(self):
    method __is_available_proxy__ (line 19) | def __is_available_proxy__(self, url, proxy):
    method is_empty (line 32) | def is_empty(self):
    method get_available_proxy (line 35) | def get_available_proxy(self, url):

FILE: cianparser/suburban/list.py
  class SuburbanListPageParser (line 13) | class SuburbanListPageParser(BaseListPageParser):
    method build_file_path (line 14) | def build_file_path(self):
    method parse_list_offers_page (line 19) | def parse_list_offers_page(self, html, page_number: int, count_of_page...
    method parse_offer (line 45) | def parse_offer(self, offer):

FILE: cianparser/suburban/page.py
  class SuburbanPageParser (line 6) | class SuburbanPageParser:
    method __init__ (line 7) | def __init__(self, session, url):
    method __load_page__ (line 11) | def __load_page__(self):
    method parse_page (line 19) | def parse_page(self):

FILE: cianparser/url_builder.py
  class URLBuilder (line 4) | class URLBuilder:
    method __init__ (line 5) | def __init__(self, is_newobject):
    method add_default_postfix (line 10) | def add_default_postfix(self):
    method add_newobject_postfix (line 13) | def add_newobject_postfix(self):
    method get_url (line 16) | def get_url(self):
    method add_accommodation_type (line 19) | def add_accommodation_type(self, accommodation_type):
    method add_deal_type (line 22) | def add_deal_type(self, deal_type):
    method add_location (line 25) | def add_location(self, location_id):
    method add_room (line 28) | def add_room(self, rooms):
    method add_rent_period_type (line 49) | def add_rent_period_type(self, rent_period_type):
    method add_object_suburban_type (line 52) | def add_object_suburban_type(self, object_type):
    method add_additional_settings (line 55) | def add_additional_settings(self, additional_settings):

Download .json

Condensed preview — 22 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (125K chars).

[
  {
    "path": ".github/FUNDING.yml",
    "chars": 88,
    "preview": "# These are supported funding model platforms\n\ngithub: [lenarsaitov]\nko_fi: lenarsaitov\n"
  },
  {
    "path": ".gitignore",
    "chars": 56,
    "preview": "/venv/\n/build/\n/dist/\n/cianparser.egg-info/\n__pycache__/"
  },
  {
    "path": "LICENSE",
    "chars": 1069,
    "preview": "MIT License\n\nCopyright (c) 2023 Lenar Saitov\n\nPermission is hereby granted, free of charge, to any person obtaining a co"
  },
  {
    "path": "README.md",
    "chars": 17233,
    "preview": "### Сбор данных с сайта объявлений об аренде и продаже недвижимости Циан\n\nCianparser - это библиотека Python 3 (версии 3"
  },
  {
    "path": "cianparser/__init__.py",
    "chars": 136,
    "preview": "from .cianparser import CianParser, list_locations, list_metro_stations\n\n__author__ = \"lenarsaitov\"\n__mail__ = \"lenarsai"
  },
  {
    "path": "cianparser/base_list.py",
    "chars": 4248,
    "preview": "import math\nimport csv\n\nfrom cianparser.constants import SPECIFIC_FIELDS_FOR_RENT_LONG, SPECIFIC_FIELDS_FOR_RENT_SHORT, "
  },
  {
    "path": "cianparser/cianparser.py",
    "chars": 12518,
    "preview": "import cloudscraper\nimport time\n\nfrom cianparser.constants import CITIES, METRO_STATIONS, DEAL_TYPES, OBJECT_SUBURBAN_TY"
  },
  {
    "path": "cianparser/constants.py",
    "chars": 23649,
    "preview": "DEAL_TYPES = {\"rent_long\", \"sale\"}\nOBJECT_SUBURBAN_TYPES = {\"house\": \"1\", \"house-part\": \"2\", \"land-plot\": \"3\", \"townhous"
  },
  {
    "path": "cianparser/definers/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cianparser/definers/definer_cities_id.py",
    "chars": 3190,
    "preview": "import time\nimport requests\nfrom bs4 import BeautifulSoup\nimport pymorphy2\nimport collections\nimport csv\nimport cloudscr"
  },
  {
    "path": "cianparser/definers/definer_metro_id.py",
    "chars": 3288,
    "preview": "import time\nimport requests\nfrom bs4 import BeautifulSoup\nimport collections\nimport csv\nimport cloudscraper\n\nParseMetroN"
  },
  {
    "path": "cianparser/flat/list.py",
    "chars": 3157,
    "preview": "import bs4\nimport time\nimport pathlib\nfrom datetime import datetime\nfrom transliterate import translit\n\nfrom cianparser."
  },
  {
    "path": "cianparser/flat/page.py",
    "chars": 2474,
    "preview": "import bs4\nimport re\nimport time\n\n\nclass FlatPageParser:\n    def __init__(self, session, url):\n        self.session = se"
  },
  {
    "path": "cianparser/helpers.py",
    "chars": 13454,
    "preview": "import re\nimport itertools\nfrom cianparser.constants import STREET_TYPES, NOT_STREET_ADDRESS_ELEMENTS, FLOATS_NUMBERS_RE"
  },
  {
    "path": "cianparser/newobject/list.py",
    "chars": 3910,
    "preview": "import bs4\nimport time\nimport math\nimport csv\nimport pathlib\nfrom datetime import datetime\nfrom transliterate import tra"
  },
  {
    "path": "cianparser/newobject/page.py",
    "chars": 2133,
    "preview": "import bs4\nimport re\nimport time\n\n\nclass NewObjectPageParser:\n    def __init__(self, session, url):\n        self.session"
  },
  {
    "path": "cianparser/proxy_pool.py",
    "chars": 2079,
    "preview": "import time\nimport urllib.request\nimport urllib.error\nimport bs4\nimport random\nimport socket\n\n\nclass ProxyPool:\n    def "
  },
  {
    "path": "cianparser/suburban/list.py",
    "chars": 3124,
    "preview": "import bs4\nimport time\nimport pathlib\nfrom datetime import datetime\nfrom transliterate import translit\n\nfrom cianparser."
  },
  {
    "path": "cianparser/suburban/page.py",
    "chars": 2736,
    "preview": "import time\n\nimport bs4\n\n\nclass SuburbanPageParser:\n    def __init__(self, session, url):\n        self.session = session"
  },
  {
    "path": "cianparser/url_builder.py",
    "chars": 5666,
    "preview": "from cianparser.constants import *\n\n\nclass URLBuilder:\n    def __init__(self, is_newobject):\n        self.url = BASE_URL"
  },
  {
    "path": "setup.cfg",
    "chars": 333,
    "preview": "[metadata]\nname = cianparser\nversion = 1.0.4\ndescription = Parser information from Cian website\nurl = https://github.com"
  },
  {
    "path": "setup.py",
    "chars": 901,
    "preview": "from setuptools import setup\n\nwith open(\"README.md\", encoding=\"utf8\") as file:\n    read_me_description = file.read()\n\n\ns"
  }
]

About this extraction

This page contains the full source code of the lenarsaitov/cianparser GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 22 files (103.0 KB), approximately 28.4k tokens, and a symbol index with 89 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo