[
  {
    "path": ".github/FUNDING.yml",
    "content": "# These are supported funding model platforms\n\ngithub: [lenarsaitov]\nko_fi: lenarsaitov\n"
  },
  {
    "path": ".gitignore",
    "content": "/venv/\n/build/\n/dist/\n/cianparser.egg-info/\n__pycache__/"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 Lenar Saitov\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "### Сбор данных с сайта объявлений об аренде и продаже недвижимости Циан\n\nCianparser - это библиотека Python 3 (версии 3.8 и выше) для парсинга сайта  [Циан](http://cian.ru).\nС его помощью можно получить достаточно подробные и структурированные данные по краткосрочной и долгосрочной аренде, продаже квартир, домов, танхаусов итд.\n\n### Установка\n```bash\npip install cianparser\n```\n\n### Использование\n```python\nimport cianparser\n\nmoscow_parser = cianparser.CianParser(location=\"Москва\")\ndata = moscow_parser.get_flats(deal_type=\"sale\", rooms=(1, 2), with_saving_csv=True, additional_settings={\"start_page\":1, \"end_page\":2})\n\nprint(data[0])\n```\n\n```\n                              Preparing to collect information from pages..\nThe absolute path to the file: \n /Users/macbook/some_project/cianparser/cian_flat_sale_1_2_moskva_12_Jan_2024_21_48_43_100892.csv \n\nThe page from which the collection of information begins: \n https://cian.ru/cat.php?engine_version=2&p=1&with_neighbors=0&region=1&deal_type=sale&offer_type=flat&room1=1&room2=1\n\nCollecting information from pages with list of offers\n 1 | 1 page with list: [=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>] 100% | Count of all parsed: 28. Progress ratio: 50 %. Average price: 45 547 801 rub\n 2 | 2 page with list: [=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>] 100% | Count of all parsed: 56. Progress ratio: 100 %. Average price: 54 040 102 rub\n\nThe collection of information from the pages with list of offers is completed\nTotal number of parsed offers: 56.\n{\n    \"author\": \"MR Group\",\n    \"author_type\": \"developer\",\n    \"url\": \"https://www.cian.ru/sale/flat/292125772/\",\n    \"location\": \"Москва\",\n    \"deal_type\": \"sale\",\n    \"accommodation_type\": \"flat\",\n    \"floor\": 20,\n    \"floors_count\": 37,\n    \"rooms_count\": 1,\n    \"total_meters\": 39.6,\n    \"price\": 28623910,\n    \"district\": \"Беговой\",\n    \"street\": \"Ленинградский проспект\",\n    \"house_number\": \"вл8\",\n    \"underground\": \"Белорусская\",\n    \"residential_complex\": \"Slava\"\n}\n```\n### Инициализация\nПараметры, используемые при инициализации парсера через функциою CianParser:\n* __location__ - локация объявления, к примеру, _Москва_ (для просмотра доступных мест используйте _cianparser.list_locations())_\n* __proxies__ - прокси (см раздел __Cloudflare, CloudScraper, Proxy__), по умолчанию _None_\n\n### Метод get_flats\nДанный метод принимает следующий аргументы:\n* __deal_type__ - тип объявления, к примеру, долгосрочная аренда, продажа _(\"rent_long\", \"sale\")_\n* __rooms__ - количество комнат, к примеру, _1, (1,3, \"studio\"), \"studio, \"all\"_; по умолчанию любое _(\"all\")_\n* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_\n* __with_extra_data__ - необходимо ли сбор дополнительных данных, но с кратным продолжительности по времени (см. ниже в __Примечании__), по умолчанию _False_\n* __additional_settings__ - дополнительные настройки поиска (см. ниже в __Дополнительные настройки поиска__), по умолчанию _None_\n\nПример:\n```python\nimport cianparser\n\nmoscow_parser = cianparser.CianParser(location=\"Москва\")\ndata = moscow_parser.get_flats(deal_type=\"rent_long\", rooms=(1, 2), additional_settings={\"start_page\":1, \"end_page\":1})\n```\n\nВ проекте предусмотрен функционал корректного завершения в случае окончания страниц. По данному моменту, следует изучить раздел __Ограничения__\n\n### Метод get_suburban (сбор объявлений домов/участков/танхаусав итп)\nДанный метод принимает следующий аргументы:\n* __suburban_type__ - тип здания, к примеру, дом/дача, часть дома, участок, танхаус _(\"house\", \"house-part\", \"land-plot\", \"townhouse\")_\n* __deal_type__ - тип объявления, к примеру, долгосрочная аренда, продажа _(\"rent_long\", \"sale\")_\n* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_\n* __with_extra_data__ - необходимо ли сбор дополнительных данных, но с кратным продолжительности по времени, по умолчанию _False_\n* __additional_settings__ - дополнительные настройки поиска (см. ниже в __Дополнительные настройки поиска__), по умолчанию _None_\n\nПример:\n```python\nimport cianparser\n\nmoscow_parser = cianparser.CianParser(location=\"Москва\")\ndata = moscow_parser.get_suburban(suburban_type=\"townhouse\", deal_type=\"sale\", additional_settings={\"start_page\":1, \"end_page\":1})\n```\n\n### Метод get_newobjects (сбор даннных по новостройкам)\nДанный метод принимает следующий аргументы:\n* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_\n\nПример:\n```python\nimport cianparser\n\nmoscow_parser = cianparser.CianParser(location=\"Москва\")\ndata = moscow_parser.get_newobjects()\n```\n\n### Дополнительные настройки поиска\nПример:\n```python\nadditional_settings = {\n    \"start_page\":1,\n    \"end_page\": 10,\n    \"is_by_homeowner\": True,\n    \"min_price\": 1000000,\n    \"max_price\": 10000000,\n    \"min_balconies\": 1,\n    \"have_loggia\": True,\n    \"min_house_year\": 1990,\n    \"max_house_year\": 2023,\n    \"min_floor\": 3,\n    \"max_floor\": 4,\n    \"min_total_floor\": 5,\n    \"max_total_floor\": 10,\n    \"house_material_type\": 1,\n    \"metro\": \"Московский\",\n    \"metro_station\": \"ВДНХ\",\n    \"metro_foot_minute\": 45,\n    \"flat_share\": 2,\n    \"only_flat\": True,\n    \"only_apartment\": True,\n    \"sort_by\": \"price_from_min_to_max\",\n}\n```\n* __object_type__ -  тип жилья (\"new\" - вторичка, \"secondary\" - новостройка)\n* __start_page__ - страница, с которого начинается сбор данных\n* __end_page__ - страница, с которого заканчивается сбор данных\n* __is_by_homeowner__ - объявления, созданных только собственниками\n* __min_price__ - цена от (в рублях)\n* __max_price__ - цена до (в рублях)\n* __min_balconies__ - минимальное количество балконов\n* __have_loggia__ - наличие лоджи\n* __min_house_year__ - год постройки дома от\n* __max_house_year__ - год постройки дома до\n* __min_floor__ - этаж от\n* __max_floor__ - этаж до\n* __min_total_floor__ - этажей в доме от\n* __max_total_floor__ - этажей в доме до\n* __house_material_type__ - тип дома (_см ниже возможные значения_)\n* __metro__ - название метрополитена (_см ниже возможные значения_)\n* __metro_station__ - станция метро (доступно при заданом metro)\n* __metro_foot_minute__ - сколько минут до метро пешком\n* __flat_share__ - с долями или без (1 - только доли, 2 - без долей)\n* __only_flat__ - без апартаментов\n* __only_apartment__ - только апартаменты\n* __sort_by__ - сортировка объявлений (_см ниже возможные значения_)\n\n#### Возможные значения поля **house_material_type**\n- _1_ - киричный\n- _2_ - монолитный\n- _3_ - панельный\n- _4_ - блочный\n- _5_ - деревянный\n- _6_ - сталинский\n- _7_ - щитовой\n- _8_ - кирпично-монолитный\n\n#### Возможные значения полей **metro** и **metro_station**\nСоответствуют ключам и значениям словаря, получаемого вызовом функции **_cianparser.list_metro_stations()_**\n\n#### Возможные значения поля **sort_by**\n- \"_price_from_min_to_max_\" - сортировка по цене (сначала дешевле)\n- \"_price_from_max_to_min_\" - сортировка по цене (сначала дороже)\n- \"_total_meters_from_max_to_min_\" - сортировка по общей площади (сначала больше)\n- \"_creation_data_from_newer_to_older_\" - сортировка по дате добавления (сначала новые)\n- \"_creation_data_from_older_to_newer_\" - сортировка по дате добавления (сначала старые)\n\n### Признаки, получаемые в ходе сбора данных с предложений по долгосрочной аренде недвижимости\n* __district__ - район\n* __underground__ - метро\n* __street__ - улица\n* __house_number__ - номер дома\n* __floor__ - этаж\n* __floors_count__ - общее количество этажей\n* __total_meters__ - общая площадь\n* __living_meters__ - жилая площади\n* __kitchen_meters__ - площадь кухни\n* __rooms_count__ - количество комнат\n* __year_construction__ - год постройки здания\n* __house_material_type__ - тип дома (киричный/монолитный/панельный итд)\n* __heating_type__ - тип отопления\n* __price_per_month__ - стоимость в месяц\n* __commissions__ - комиссия, взымаемая при заселении\n* __author__ - автор объявления\n* __author_type__ - тип автора \n* __phone__ - номер телефона в объявлении\n* __url__ - ссылка на объявление\n\nВозможные значения поля __author_type__:\n- __real_estate_agent__ - агентство недвижимости\n- __homeowner__ - собственник\n- __realtor__ - риелтор\n- __official_representative__ - ук оф.представитель\n- __representative_developer__ - представитель застройщика\n- __developer__ - застройщик\n- __unknown__ - без указанного типа\n\n### Признаки, получаемые в ходе сбора данных с предложений по продаже недвижимости\n\nПризнаки __аналогичны__ вышеописанным, кроме отсутствия полей __price_per_month__ и __commissions__.\n\nПри этом появляются новые:\n* __price__ - стоимость недвижимости\n* __residential_complex__ - название жилого комплекса\n* __object_type__ -  тип жилья (вторичка/новостройка)\n* __finish_type__ - отделка\n\n### Признаки, получаемые в ходе сбора данных по новостройкам\n* __name__ - наименование ЖК\n* __url__ - ссылка на страницу\n* __full_location_address__ - полный адрес расположения ЖК\n* __year_of_construction__ - год сдачи\n* __house_material_type__ - тип дома (_см выше возможные значения_)\n* __finish_type__ - отделка\n* __ceiling_height__ - высота потолка\n* __class__ - класс жилья\n* __parking_type__ - тип парковки\n* __floors_from__ - этажность (от)\n* __floors_to__ - этажность (до)\n* __builder__ - застройщик\n\n### Сохранение данных\nИмеется возможность сохранения собираемых данных в режиме реального времени. Для этого необходимо подставить в аргументе \n__with_saving_csv__ значение ___True___.\n\n#### Пример получаемого файла при вызове метода __get_flats__ с __with_extra_data__ = __True__:\n\n```bash\ncian_flat_sale_1_1_moskva_12_Jan_2024_22_29_48_117413.csv\n```\n| author | author_type | url | location | deal_type | accommodation_type | floor | floors_count | rooms_count | total_meters | price_per_m2 | price | year_of_construction | object_type | house_material_type | heating_type | finish_type | living_meters | kitchen_meters | phone | district | street | house_number | underground | residential_complex\n| ------ | ------ | ------ | ------ | ------ | ------ | ----------- | ---- | ---- | --------- | ------------------ | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---\n| White and Broughton | real_estate_agent | https://www.cian.ru/sale/flat/290499455/ | Москва | sale | flat | 3 | 40 | 1 | 45.5 | 709890 | 32300000 | 2021 | Вторичка | Монолитный | Центральное | -1 | 19.0 | 6.0 | +79646331510 | Хорошевский | Ленинградский проспект | 37/4 | Динамо | Прайм Парк\n| ФСК | developer | https://www.cian.ru/sale/flat/288376323/ | Москва | sale | flat | 24 | 47 | 2 | 46.0 | 528900 | 24329400 | 2024 | Новостройка | Монолитно-кирпичный | -1 | Без отделки, предчистовая, чистовая | 18.0 | 15.0 | +74951387154 | Обручевский |  Академика Волгина | 2С1 | Калужская | Архитектор\n| White and Broughton | real_estate_agent | https://www.cian.ru/sale/flat/292416804/ | Москва | sale | flat | 2 | 41 | 2 | 60.0 | 783333 | 47000000 | 2021 | Вторичка | -1 | Центральное | -1 | 43.0 | 5.0 | +79646331510 | Хорошевский | Ленинградский проспект | 37/5 | Динамо | Прайм Парк\n\n#### Пример получаемого файла при вызове метода __get_suburban__ с __with_extra_data__ = __True__:\n\n```bash\ncian_suburban_townhouse_sale_15_15_moskva_13_Jan_2024_04_30_47_963046.csv\n```\n| author | author_type | url | location | deal_type | accommodation_type | price | year_of_construction | house_material_type | land_plot | land_plot_status | heating_type | gas_type | water_supply_type | sewage_system | bathroom | living_meters | floors_count | phone | district | underground | street | house_number\n | -----  | -----  | -----  | -----  | -----  | -----  | -----  | -----  | -----  | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---\n| New Moscow House | real_estate_agent | https://www.cian.ru/sale/suburban/296304861/ | Москва | sale | suburban | 93000000 | 2020 | Кирпичный | 13 сот. | -1 | -1 | Есть | Есть | Есть | В доме | -1 | 2 | +79096865868 | Первомайское поселение |  | улица Центральная | 21\n| LaRichesse | real_estate_agent | https://www.cian.ru/sale/suburban/290335502/ | Москва | sale | suburban | 95000000 | -1 | Пенобетонный блок | 12 сот. | Индивидуальное жилищное строительство | Центральное | -1 | -1 | -1 | -1 | 502,8 м² | 2 | +79652502027 | Воскресенское поселение |  | улица Каменка | 44Ас1\n| Динара Ваганова | realtor | https://www.cian.ru/sale/suburban/293424451/ | Москва | sale | suburban | 21990000 | -1 | -1 | -1 | Индивидуальное жилищное строительство | -1 | Нет | -1 | Нет | -1 | -1 | -1 | +79672093870 | Первомайское поселение | м. Крёкшино |  |\n\n#### Пример получаемого файла при вызове метода __get_newobjects__:\n\n```bash\ncian_newobject_13_Jan_2024_01_27_32_734734.csv\n```\n| name | location | accommodation_type | url | full_location_address | year_of_construction | house_material_type | finish_type | ceiling_height | class | parking_type | floors_from | floors_to | builder\n | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---\n| ЖК «SYMPHONY 34 (Симфони 34)» | Москва | newobject | https://zhk-symphony-34-i.cian.ru | Москва, САО, Савеловский, 2-я Хуторская ул., 34 | 2025 | Монолитный | Предчистовая, чистовая | 3,0 м | Премиум | Подземная, гостевая | 36 | 54 | Застройщик MR Group\n| ЖК «Коллекция клубных особняков Ильинка 3/8» | Москва | newobject | https://zhk-kollekciya-klubnyh-osobnyakov-ilinka-38-i.cian.ru | Москва, ЦАО, Тверской, ул. Ильинка | 2024 | Монолитно-кирпичный, монолитный | Без отделки | от 3,35 м до 6,0 м | Премиум | Подземная, гостевая | 3 | 5 | Застройщик Sminex-Интеко\n| ЖК «Victory Park Residences (Виктори Парк Резиденсез)» | Москва | newobject | https://zhk-victory-park-residences-i.cian.ru | Москва, ЗАО, Дорогомилово, ул. Братьев Фонченко | 2024 | Монолитный | Чистовая | — | Премиум | Подземная | 10 | 11 | Застройщик ANT Development\n\n\n### Cloudflare, CloudScraper, Proxy\nДля обхода блокировки в проекте задействован **CloudScraper** (библиотека **cloudscraper**), который позволяет успешно обходить защиту **Cloudflare**.\n\nВместе с тем, это не гарантирует отсутствие возможности появления _у некоторых пользователей_ теста **CAPTCHA** при долговременном непрерывном использовании.\n\n#### Proxy\nПоэтому была предоставлена возможность проставлять прокси, используя аргумент **proxies** (_список прокси протокола HTTPS_)\n\nПример:\n```python\nproxies = [\n    '117.250.3.58:8080', \n    '115.96.208.124:8080',\n    '152.67.0.109:80', \n    '45.87.68.2:15321', \n    '68.178.170.59:80', \n    '20.235.104.105:3729', \n    '195.201.34.206:80',\n]\n```\n\nВ процессе запуска утилита проходится по всем из них, пытаясь определить подходящий, то есть тот, \nкоторый может, во первых, делать запросы, во вторых, не иметь тест **_CAPTCHA_**\n\nПример лога, в котором представлено все три возможных кейса\n\n```\nThe process of checking the proxies... Search an available one among them...\n 1 | proxy 46.47.197.210:3128: unavailable.. trying another\n 2 | proxy 213.184.153.66:8080: there is captcha.. trying another\n 3 | proxy 95.66.138.21:8880: available.. stop searching\n```\n\n### Ограничения\nСайт выдает списки с объявлениями <ins>__лишь до 54 странцы включительно__</ins>. Это примерно _28 * 54 = 1512_ объявлений.\nПоэтому, если имеется желание собрать как можно больше данных, то следует использовать более конкретные запросы (по количеству комнат). \n\nК примеру, вместо того, чтобы при использовании указывать _rooms=(1, 2)_, стоит два раза отдельно собирать данные с параметрами _rooms=1_ и _rooms=2_ соответственно.\n\nТаким образом, максимальная разница может составить 1 к 6 (студия, 1, 2, 3, 4, 5 комнатные квартиры), то есть 1512 к 9072.\n\n### Примечание\n1. В некоторых объявлениях отсутсвуют данные по некоторым признакам (_год постройки, жилые кв метры, кв метры кухни итп_).\nВ этом случае проставляется значение ___-1___ либо ___пустая строка___ для числового и строкового типа поля соответственно.\n\n2. Для отсутствия блокировки по __IP__ в данном проекте задана пауза (___в размере 4-5 секунд___) после сбора информации с\nкаждой отдельной взятой страницы.\n\n3. Не рекомендутся производить несколько процессов сбора данных параллельно (одновременно) на одной машине (см. пункт 2).\n\n4. Имеется флаг __with_extra_data__, при помощи которого можно дополнительно собирать некоторые данные, но при этом существенно (___в 5-10 раз___) замедляется процесс по времени, из-за необходимости заходить на каждую страницу с предложением. \nСоответствующие данные: ___площадь кухни, год постройки здания, тип дома, тип отделки, тип отопления, тип жилья___  и ___номер телефона___.\n\n5. Данный парсер не будет работать в таком инструменте как [Google Colaboratory](https://colab.research.google.com/). \nСм. [подробности](https://github.com/lenarsaitov/cianparser/issues/1)\n\n6. Если в проекте не имеется подходящего локации (неожидаемое значение аргумента __location__) или иными словами его нет в списке **_cianparser.list_locations()_**, то прошу сообщить, буду рад добавить.\n"
  },
  {
    "path": "cianparser/__init__.py",
    "content": "from .cianparser import CianParser, list_locations, list_metro_stations\n\n__author__ = \"lenarsaitov\"\n__mail__ = \"lenarsaitov1@yandex.ru\"\n"
  },
  {
    "path": "cianparser/base_list.py",
    "content": "import math\nimport csv\n\nfrom cianparser.constants import SPECIFIC_FIELDS_FOR_RENT_LONG, SPECIFIC_FIELDS_FOR_RENT_SHORT, SPECIFIC_FIELDS_FOR_SALE\n\n\nclass BaseListPageParser:\n    def __init__(self,\n                 session,\n                 accommodation_type: str, deal_type: str, rent_period_type, location_name: str,\n                 with_saving_csv=False, with_extra_data=False,\n                 object_type=None, additional_settings=None):\n        self.accommodation_type = accommodation_type\n        self.session = session\n        self.deal_type = deal_type\n        self.rent_period_type = rent_period_type\n        self.location_name = location_name\n        self.with_saving_csv = with_saving_csv\n        self.with_extra_data = with_extra_data\n        self.additional_settings = additional_settings\n        self.object_type = object_type\n\n        self.result = []\n        self.result_set = set()\n        self.average_price = 0\n        self.count_parsed_offers = 0\n        self.start_page = 1 if (additional_settings is None or \"start_page\" not in additional_settings.keys()) else additional_settings[\"start_page\"]\n        self.end_page = 100 if (additional_settings is None or \"end_page\" not in additional_settings.keys()) else additional_settings[\"end_page\"]\n        self.file_path = self.build_file_path()\n\n    def is_sale(self):\n        return self.deal_type == \"sale\"\n\n    def is_rent_long(self):\n        return self.deal_type == \"rent\" and self.rent_period_type == 4\n\n    def is_rent_short(self):\n        return self.deal_type == \"rent\" and self.rent_period_type == 2\n\n    def build_file_path(self):\n        pass\n\n    def define_average_price(self, price_data):\n        if \"price\" in price_data:\n            self.average_price = (self.average_price * self.count_parsed_offers + price_data[\"price\"]) / self.count_parsed_offers\n        elif \"price_per_month\" in price_data:\n            self.average_price = (self.average_price * self.count_parsed_offers + price_data[\"price_per_month\"]) / self.count_parsed_offers\n\n    def print_parse_progress(self, page_number, count_of_pages, offers, ind):\n        total_planed_offers = len(offers) * count_of_pages\n        print(f\"\\r {page_number - self.start_page + 1}\"\n              f\" | {page_number} page with list: [\" + \"=>\" * (ind + 1) + \"  \" * (len(offers) - ind - 1) + \"]\" + f\" {math.ceil((ind + 1) * 100 / len(offers))}\" + \"%\" +\n              f\" | Count of all parsed: {self.count_parsed_offers}.\"\n              f\" Progress ratio: {math.ceil(self.count_parsed_offers * 100 / total_planed_offers)} %.\"\n              f\" Average price: {'{:,}'.format(int(self.average_price)).replace(',', ' ')} rub\",\n              end=\"\\r\", flush=True)\n\n    def remove_unnecessary_fields(self):\n        if self.is_sale():\n            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_LONG:\n                if not_need_field in self.result[-1]:\n                    del self.result[-1][not_need_field]\n\n            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_SHORT:\n                if not_need_field in self.result[-1]:\n                    del self.result[-1][not_need_field]\n\n        if self.is_rent_long():\n            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_SHORT:\n                if not_need_field in self.result[-1]:\n                    del self.result[-1][not_need_field]\n\n            for not_need_field in SPECIFIC_FIELDS_FOR_SALE:\n                if not_need_field in self.result[-1]:\n                    del self.result[-1][not_need_field]\n\n        if self.is_rent_short():\n            for not_need_field in SPECIFIC_FIELDS_FOR_RENT_LONG:\n                if not_need_field in self.result[-1]:\n                    del self.result[-1][not_need_field]\n\n            for not_need_field in SPECIFIC_FIELDS_FOR_SALE:\n                if not_need_field in self.result[-1]:\n                    del self.result[-1][not_need_field]\n\n        return self.result\n\n    def save_results(self):\n        self.remove_unnecessary_fields()\n        keys = self.result[0].keys()\n\n        with open(self.file_path, 'w', newline='', encoding='utf-8') as output_file:\n            dict_writer = csv.DictWriter(output_file, keys, delimiter=';')\n            dict_writer.writeheader()\n            dict_writer.writerows(self.result)"
  },
  {
    "path": "cianparser/cianparser.py",
    "content": "import cloudscraper\nimport time\n\nfrom cianparser.constants import CITIES, METRO_STATIONS, DEAL_TYPES, OBJECT_SUBURBAN_TYPES\nfrom cianparser.url_builder import URLBuilder\nfrom cianparser.proxy_pool import ProxyPool\nfrom cianparser.flat.list import FlatListPageParser\nfrom cianparser.suburban.list import SuburbanListPageParser\nfrom cianparser.newobject.list import NewObjectListParser\n\n\ndef list_locations():\n    return CITIES\n\n\ndef list_metro_stations():\n    return METRO_STATIONS\n\n\nclass CianParser:\n    def __init__(self, location: str, proxies=None):\n        \"\"\"\n        Initialize the Cian website parser\n        Examples:\n            >>> moscow_parser = cianparser.CianParser(location=\"Москва\")\n        :param str location: location. e.g. \"Москва\", for see all correct values use cianparser.list_locations()\n        :param proxies: proxies for executing requests (https scheme), default None\n        \"\"\"\n\n        location_id = __validation_init__(location)\n\n        self.__parser__ = None\n        self.__session__ = cloudscraper.create_scraper()\n        self.__session__.headers = {'Accept-Language': 'en'}\n        self.__proxy_pool__ = ProxyPool(proxies=proxies)\n        self.__location_name__ = location\n        self.__location_id__ = location_id\n\n    def __set_proxy__(self, url_list):\n        if self.__proxy_pool__.is_empty():\n            return\n        available_proxy = self.__proxy_pool__.get_available_proxy(url_list)\n        if available_proxy is not None:\n            self.__session__.proxies = {\"https\": available_proxy}\n\n    def __load_list_page__(self, url_list_format, page_number, attempt_number_exception):\n        url_list = url_list_format.format(page_number)\n        self.__set_proxy__(url_list)\n\n        if page_number == self.__parser__.start_page and attempt_number_exception == 0:\n            print(f\"The page from which the collection of information begins: \\n {url_list}\")\n\n        res = self.__session__.get(url=url_list)\n        if res.status_code == 429:\n            time.sleep(10)\n        res.raise_for_status()\n\n        return res.text\n\n    def __run__(self, url_list_format: str):\n        print(f\"\\n{' ' * 30}Preparing to collect information from pages..\")\n\n        if self.__parser__.with_saving_csv:\n            print(f\"The absolute path to the file: \\n{self.__parser__.file_path} \\n\")\n\n        page_number = self.__parser__.start_page - 1\n        end_all_parsing = False\n        while page_number < self.__parser__.end_page and not end_all_parsing:\n            page_parsed = False\n            page_number += 1\n            attempt_number_exception = 0\n\n            while attempt_number_exception < 3 and not page_parsed:\n                try:\n                    (page_parsed, attempt_number, end_all_parsing) = self.__parser__.parse_list_offers_page(\n                        html=self.__load_list_page__(url_list_format=url_list_format, page_number=page_number, attempt_number_exception=attempt_number_exception),\n                        page_number=page_number,\n                        count_of_pages=self.__parser__.end_page + 1 - self.__parser__.start_page,\n                        attempt_number=attempt_number_exception)\n\n                except Exception as e:\n                    attempt_number_exception += 1\n                    if attempt_number_exception < 3:\n                        continue\n                    print(f\"\\n\\nException: {e}\")\n                    print(f\"The collection of information from the pages with ending parse on {page_number} page...\\n\")\n                    break\n\n        print(f\"\\n\\nThe collection of information from the pages with list of offers is completed\")\n        print(f\"Total number of parsed offers: {self.__parser__.count_parsed_offers}. \", end=\"\\n\")\n\n    def get_flats(self, deal_type: str, rooms, with_saving_csv=False, with_extra_data=False, additional_settings=None):\n        \"\"\"\n        Parse information of flats from cian website\n        Examples:\n            >>> moscow_parser = cianparser.CianParser(location=\"Москва\")\n            >>> data = moscow_parser.get_flats(deal_type=\"rent_long\", rooms=1)\n            >>> data = moscow_parser.get_flats(deal_type=\"rent_short\", rooms=(1,3,\"studio\"), with_saving_csv=True)\n            >>> data = moscow_parser.get_flats(deal_type=\"sale\", additional_settings={\"start_page\": 1, \"end_page\": 1, \"sort_by\":\"price_from_min_to_max\"})\n        :param deal_type: type of deal, e.g. \"rent_long\", \"rent_short\", \"sale\"\n        :param rooms: how many rooms in accommodation, default \"all\". Example 1, (1,3, \"studio\"), \"studio, \"all\"\n        :param with_saving_csv: is it necessary to save data in csv, default False\n        :param with_extra_data:  is it necessary to collect additional data (but with increasing time duration), default False\n        :param additional_settings:  additional settings such as min_price, sort_by and others, default None\n        \"\"\"\n\n        __validation_get_flats__(deal_type, rooms)\n        deal_type, rent_period_type = __define_deal_type__(deal_type)\n        self.__parser__ = FlatListPageParser(\n            session=self.__session__,\n            accommodation_type=\"flat\",\n            deal_type=deal_type,\n            rent_period_type=rent_period_type,\n            location_name=self.__location_name__,\n            with_saving_csv=with_saving_csv,\n            with_extra_data=with_extra_data,\n            additional_settings=additional_settings,\n        )\n        self.__run__(\n            __build_url_list__(location_id=self.__location_id__, deal_type=deal_type, accommodation_type=\"flat\",\n                               rooms=rooms, rent_period_type=rent_period_type,\n                               additional_settings=additional_settings))\n        return self.__parser__.result\n\n    def get_suburban(self, suburban_type: str, deal_type: str, with_saving_csv=False, with_extra_data=False, additional_settings=None):\n        \"\"\"\n        Parse information of suburbans from cian website\n        Examples:\n            >>> moscow_parser = cianparser.CianParser(location=\"Москва\")\n            >>> data = moscow_parser.get_suburbans(suburban_type=\"house\",deal_type=\"rent_long\")\n            >>> data = moscow_parser.get_suburbans(suburban_type=\"house\",deal_type=\"rent_short\", with_saving_csv=True)\n            >>> data = moscow_parser.get_suburbans(suburban_type=\"townhouse\",deal_type=\"sale\", additional_settings={\"start_page\": 1, \"end_page\": 1, \"sort_by\":\"price_from_min_to_max\"})\n        :param suburban_type: type of suburban building, e.g. \"house\", \"house-part\", \"land-plot\", \"townhouse\"\n        :param deal_type: type of deal, e.g. \"rent_long\", \"rent_short\", \"sale\"\n        :param with_saving_csv: is it necessary to save data in csv, default False\n        :param with_extra_data:  is it necessary to collect additional data (but with increasing time duration), default False\n        :param additional_settings:  additional settings such as min_price, sort_by and others, default None\n        \"\"\"\n\n        __validation_get_suburban__(suburban_type=suburban_type, deal_type=deal_type)\n        deal_type, rent_period_type = __define_deal_type__(deal_type)\n        self.__parser__ = SuburbanListPageParser(\n            session=self.__session__,\n            accommodation_type=\"suburban\",\n            deal_type=deal_type,\n            rent_period_type=rent_period_type,\n            location_name=self.__location_name__,\n            with_saving_csv=with_saving_csv,\n            with_extra_data=with_extra_data,\n            additional_settings=additional_settings,\n            object_type=suburban_type,\n        )\n        self.__run__(\n            __build_url_list__(location_id=self.__location_id__, deal_type=deal_type, accommodation_type=\"suburban\",\n                               rooms=None, rent_period_type=rent_period_type, suburban_type=suburban_type,\n                               additional_settings=additional_settings))\n        return self.__parser__.result\n\n    def get_newobjects(self, with_saving_csv=False):\n        \"\"\"\n        Parse information of newobjects from cian website\n        Examples:\n            >>> moscow_parser = cianparser.CianParser(location=\"Москва\")\n            >>> data = moscow_parser.get_newobjects(with_saving_csv=True)\n        :param with_saving_csv: is it necessary to save data in csv, default False\n        \"\"\"\n\n        self.__parser__ = NewObjectListParser(\n            session=self.__session__,\n            location_name=self.__location_name__,\n            with_saving_csv=with_saving_csv,\n        )\n        self.__run__(\n            __build_url_list__(location_id=self.__location_id__, deal_type=\"sale\", accommodation_type=\"newobject\"))\n        return self.__parser__.result\n\n\ndef __validation_init__(location):\n    location_id = None\n    for location_info in list_locations():\n        if location_info[0] == location:\n            location_id = location_info[1]\n\n    if location_id is None:\n        ValueError(f'You entered {location}, which is not exists in base.'\n                   f' See all available values of location in cianparser.list_locations()')\n\n    return location_id\n\n\ndef __validation_get_flats__(deal_type, rooms):\n    if deal_type not in DEAL_TYPES:\n        raise ValueError(f'You entered deal_type={deal_type}, which is not valid value. '\n                         f'Try entering one of these values: \"rent_long\", \"sale\".')\n\n    if type(rooms) is tuple:\n        for count_of_room in rooms:\n            if type(count_of_room) is int:\n                if count_of_room < 1 or count_of_room > 5:\n                    raise ValueError(f'You entered {count_of_room} in {rooms}, which is not valid value. '\n                                     f'Try entering one of these values: 1, 2, 3, 4, 5, \"studio\", \"all\".')\n            elif type(count_of_room) is str:\n                if count_of_room != \"studio\":\n                    raise ValueError(f'You entered {count_of_room} in {rooms}, which is not valid value. '\n                                     f'Try entering one of these values: 1, 2, 3, 4, 5, \"studio\", \"all\".')\n            else:\n                raise ValueError(f'In tuple \"rooms\" not valid type of element. '\n                                 f'It is correct int and str types. Example (1,3,5, \"studio\").')\n    elif type(rooms) is int:\n        if rooms < 1 or rooms > 5:\n            raise ValueError(f'You entered rooms={rooms}, which is not valid value. '\n                             f'Try entering one of these values: 1, 2, 3, 4, 5, \"studio\", \"all\".')\n    elif type(rooms) is str:\n        if rooms != \"studio\" and rooms != \"all\":\n            raise ValueError(f'You entered rooms={rooms}, which is not valid value. '\n                             f'Try entering one of these values: 1, 2, 3, 4, 5, \"studio\", \"all\".')\n    else:\n        raise ValueError(f'In argument \"rooms\" not valid type of element. '\n                         f'It is correct int, str and tuple types. Example 1, (1,3, \"studio\"), \"studio, \"all\".')\n\n\ndef __validation_get_suburban__(suburban_type, deal_type):\n    if suburban_type not in OBJECT_SUBURBAN_TYPES.keys():\n        raise ValueError(f'You entered suburban_type={suburban_type}, which is not valid value. '\n                         f'Try entering one of these values: \"house\", \"house-part\", \"land-plot\", \"townhouse\".')\n\n    if deal_type not in DEAL_TYPES:\n        raise ValueError(f'You entered deal_type={deal_type}, which is not valid value. '\n                         f'Try entering one of these values: \"rent_long\", \"sale\".')\n\n\ndef __build_url_list__(location_id, deal_type, accommodation_type, rooms=None, rent_period_type=None,\n                       suburban_type=None, additional_settings=None):\n    url_builder = URLBuilder(accommodation_type == \"newobject\")\n    url_builder.add_location(location_id)\n    url_builder.add_deal_type(deal_type)\n    url_builder.add_accommodation_type(accommodation_type)\n\n    if rooms is not None:\n        url_builder.add_room(rooms)\n\n    if rent_period_type is not None:\n        url_builder.add_rent_period_type(rent_period_type)\n\n    if suburban_type is not None:\n        url_builder.add_object_suburban_type(suburban_type)\n\n    if additional_settings is not None:\n        url_builder.add_additional_settings(additional_settings)\n\n    return url_builder.get_url()\n\n\ndef __define_deal_type__(deal_type):\n    rent_period_type = None\n    if deal_type == \"rent_long\":\n        deal_type, rent_period_type = \"rent\", 4\n    elif deal_type == \"rent_short\":\n        deal_type, rent_period_type = \"rent\", 2\n    return deal_type, rent_period_type\n"
  },
  {
    "path": "cianparser/constants.py",
    "content": "DEAL_TYPES = {\"rent_long\", \"sale\"}\nOBJECT_SUBURBAN_TYPES = {\"house\": \"1\", \"house-part\": \"2\", \"land-plot\": \"3\", \"townhouse\": \"4\"}\nOBJECT_TYPES = {\"secondary\": \"1\", \"new\": \"2\"}\n\n# DEAL_TYPES_NOT_IMPLEMENTED_YET = {\"rent_short\"}\n\n# ACCOMMODATION_TYPES_NOT_IMPLEMENTED_YET = {\"room\", \"house\", \"house-part\", \"townhouse\"}\n\nFLOATS_NUMBERS_REG_EXPRESSION = r\"[+-]? *(?:\\d+(?:\\.\\d*)?|\\.\\d+)(?:[eE][+-]?\\d+)?\"\n\nFILE_NAME_FLAT_FORMAT = 'cian_{}_{}_{}_{}_{}_{}.csv'\nFILE_NAME_SUBURBAN_FORMAT = 'cian_{}_{}_{}_{}_{}_{}_{}.csv'\nFILE_NAME_NEWOBJECT_FORMAT = 'cian_{}_{}_{}.csv'\n\nBASE_URL = \"https://cian.ru\"\nDEFAULT_POSTFIX_PATH = \"/cat.php?\"\nNEWOBJECT_POSTFIX_PATH = \"/newobjects/list/?\"\nDEFAULT_PATH = \"engine_version=2&p={}&with_neighbors=0\"\nREGION_PATH = \"&region={}\"\nOFFER_TYPE_PATH = \"&offer_type={}\"\nRENT_PERIOD_TYPE_PATH = \"&type={}\"\nDEAL_TYPE_PATH = \"&deal_type={}\"\nOBJECT_TYPE_PATH = \"&object_type%5B0%5D={}\"\n\nROOM_PATH = \"&room{}=1\"\nSTUDIO_PATH = \"&room9=1\"\nIS_ONLY_HOMEOWNER_PATH = \"&is_by_homeowner=1\"\nMIN_BALCONIES_PATH = \"&min_balconies={}\"\nHAVE_LOGGIA_PATH = \"&loggia=1\"\nMIN_HOUSE_YEAR_PATH = \"&min_house_year={}\"\nMAX_HOUSE_YEAR_PATH = \"&max_house_year={}\"\nMIN_PRICE_PATH = \"&minprice={}\"\nMAX_PRICE_PATH = \"&maxprice={}\"\nMIN_FLOOR_PATH = \"&minfloor={}\"\nMAX_FLOOR_PATH = \"&maxfloor={}\"\nMIN_TOTAL_FLOOR_PATH = \"&minfloorn={}\"\nMAX_TOTAL_FLOOR_PATH = \"&maxfloorn={}\"\n\nHOUSE_MATERIAL_TYPE_PATH = \"&house_material%5B0%5D={}\"\n\nMETRO_FOOT_MINUTE_PATH = \"&only_foot=2&foot_min={}\"\nMETRO_ID_PATH = \"&metro%5B0%5D={}\"\n\nFLAT_SHARE_PATH = \"&flat_share={}\"\nONLY_FLAT_PATH = \"&only_flat={}\"\nAPARTMENT_PATH = \"&apartment={}\"\n\nSORT_BY_PRICE_FROM_MIN_TO_MAX_PATH = \"&sort=price_object_order\"\nSORT_BY_PRICE_FROM_MAX_TO_MIN_PATH = \"&sort=total_price_desc\"\nSORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH = \"&sort=area_order\"\nSORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH = \"&sort=creation_date_desc\"\nSORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH = \"&sort=creation_date_asc\"\n\nIS_SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH = \"price_from_min_to_max\"\nIS_SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH = \"price_from_max_to_min\"\nIS_SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH = \"total_meters_from_max_to_min\"\nIS_SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH = \"creation_data_from_newer_to_older\"\nIS_SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH = \"creation_data_from_older_to_newer\"\n\nNOT_STREET_ADDRESS_ELEMENTS = {\"ЖК\", \"м.\", \"мкр.\", \"Жилой комплекс\", \"Жилой Комплекс\"}\n\nSTREET_TYPES = {\"ул.\", \"улица\", \"аллея\", \"бульвар\", \"линия\", \"набережная\", \"тракт\", \"тупик\", \"шоссе\", \"переулок\",\n                \"проспект\", \"проезд\", \"раздъезд\", \"мост\", \"авеню\"}\n\nSPECIFIC_FIELDS_FOR_RENT_LONG = {\"price_per_month\", \"commissions\"}\nSPECIFIC_FIELDS_FOR_RENT_SHORT = {\"price_per_day\"}\nSPECIFIC_FIELDS_FOR_SALE = {\"price\", \"residential_complex\", \"object_type\", \"finish_type\"}\n\nCITIES = [\n    ['Москва', '1'],\n    ['Санкт-Петербург', '2'],\n    ['Абакан', '4638'],\n    ['Анадырь', '4648'],\n    ['Архангельск', '4658'],\n    ['Астрахань', '4660'],\n    ['Барнаул', '4668'],\n    ['Белгород', '4671'],\n    ['Биробиджан', '4682'],\n    ['Благовещенск', '4683'],\n    ['Бронницы', '4690'],\n    ['Брянск', '4691'],\n    ['Великий Новгород', '4694'],\n    ['Владивосток', '4701'],\n    ['Владикавказ', '4702'],\n    ['Владимир', '4703'],\n    ['Волгоград', '4704'],\n    ['Вологда', '4708'],\n    ['Воронеж', '4713'],\n    ['Геленджик', '4717'],\n    ['Горно-Алтайск', '4719'],\n    ['Грозный', '4723'],\n    ['Дзержинский', '4734'],\n    ['Долгопрудный', '4738'],\n    ['Дубна', '4741'],\n    ['Екатеринбург', '4743'],\n    ['Жуковский', '4750'],\n    ['Звенигород', '4756'],\n    ['Иванов', '4767'],\n    ['Ижевск', '4770'],\n    ['Иркутск', '4774'],\n    ['Йошкар-Ола', '4776'],\n    ['Казань', '4777'],\n    ['Калининград', '4778'],\n    ['Калуга', '4780'],\n    ['Кемерово', '4795'],\n    ['Киров', '4800'],\n    ['Коломна', '4809'],\n    ['Королёв', '4813'],\n    ['Красноармейск', '4817'],\n    ['Краснодар', '4820'],\n    ['Краснознаменск', '4822'],\n    ['Красноярск', '4827'],\n    ['Курган', '4834'],\n    ['Курск', '4835'],\n    ['Кызыл', '4837'],\n    ['Липецк', '4847'],\n    ['Лобня', '4848'],\n    ['Лыткарино', '4851'],\n    ['Магадан', '4852'],\n    ['Майкоп', '4855'],\n    ['Махачкала', '4857'],\n    ['Мурманск', '4871'],\n    ['Нальчик', '4875'],\n    ['Нарьян-Мар', '4876'],\n    ['Нижний Новгород', '4885'],\n    ['Новороссийск', '4896'],\n    ['Новокузнецк', '4894'],\n    ['Новосибирск', '4897'],\n    ['Омск', '4914'],\n    ['Оренбург', '4915'],\n    ['Орехово-Зуево', '4916'],\n    ['Пенза', '4923'],\n    ['Пермь', '4927'],\n    ['Петрозаводск', '4930'],\n    ['Петропавловск-Камчатский', '4931'],\n    ['Подольск', '4935'],\n    ['Протвино', '4945'],\n    ['Псков', '4946'],\n    ['Пущино', '4949'],\n    ['Реутов', '4958'],\n    ['Ростов-на-Дону', '4959'],\n    ['Рошаль', '4960'],\n    ['Рязань', '4963'],\n    ['Салехард', '4965'],\n    ['Самара', '4966'],\n    ['Саранск', '4967'],\n    ['Саратов', '4969'],\n    ['Серпухов', '4983'],\n    ['Смоленск', '4987'],\n    ['Сочи', '4998'],\n    ['Ставрополь', '5001'],\n    ['Сургут', '5003'],\n    ['Сыктывкар', '5006'],\n    ['Тамбов', '5011'],\n    ['Тольятти', '5015'],\n    ['Томск', '5016'],\n    ['Тула', '5020'],\n    ['Тюмень', '5024'],\n    ['Улан-Удэ', '5026'],\n    ['Ульяновск', '5027'],\n    ['Фрязино', '5038'],\n    ['Хабаровск', '5039'],\n    ['Ханты-Мансийск', '5041'],\n    ['Химки', '5044'],\n    ['Чебоксары', '5047'],\n    ['Челябинск', '5048'],\n    ['Череповец', '5050'],\n    ['Черкесск', '5051'],\n    ['Чита', '5053'],\n    ['Электросталь', '5064'],\n    ['Элиста', '5065'],\n    ['Южно-Сахалинск', '5069'],\n    ['Якутск', '5073'],\n    ['Ярославль', '5075'],\n]\n\nOTHER_CITIES = [\n    ['Азов', '174136'],\n    ['Аксай', '174151'],\n    ['Альметьевск', '174184'],\n    ['Анапа', '174191'],\n    ['Балашиха', '174292'],\n    ['Бокситогорск', '174373'],\n    ['Бора', '174402'],\n    ['Видное', '174508'],\n    ['Волоколамск', '174522'],\n    ['Воскресенск', '174530'],\n    ['Высоковск', '174541'],\n    ['Голицын', '174573'],\n    ['Дмитров', '174634'],\n    ['Домодедово', '174640'],\n    ['Дрезна', '174644'],\n    ['Егорьевск', '174659'],\n    ['Истра', '174832'],\n    ['Кашира', '174957'],\n    ['Клин', '175004'],\n    ['Кострома', '175050'],\n    ['Котельник', '175051'],\n    ['Красногорск', '175071'],\n    ['Краснозаводск', '175075'],\n    ['Кубинка', '175104'],\n    ['Ликино-Дулёво', '175209'],\n    ['Лосино-Петровский', '175219'],\n    ['Луховицы', '175226'],\n    ['Люберцы', '175231'],\n    ['Можайск', '175349'],\n    ['Мытищи', '175378'],\n    ['Набережные Челны', '175380'],\n    ['Назрань', '175389'],\n    ['Одинцово', '175578'],\n    ['Орёл', '175604'],\n    ['Павловский Посад', '175635'],\n    ['Пушкин', '175744'],\n    ['Раменское', '175758'],\n    ['Руза', '175785'],\n    ['Сергиевом Посад', '175864'],\n    ['Солнечногорск', '175903'],\n    ['Ступино', '175996'],\n    ['Талдом', '176052'],\n    ['Тверь', '176083'],\n    ['Уфа', '176245'],\n    ['Хотьково', '176281'],\n    ['Черноголовка', '176316'],\n    ['Чехов', '176321'],\n    ['Шатура', '176366'],\n    ['Щёлково', '176401'],\n    ['Электрогорск', '176405'],\n    ['Яхрома', '176463'],\n]\n\nCITIES.extend(OTHER_CITIES)\n\nMETRO_STATIONS = {\n    \"Московский\": [\n        ['Авиамоторная', '1'],\n        ['Автозаводская', '2'],\n        ['Академическая', '3'],\n        ['Александровский сад', '4'],\n        ['Алексеевская', '5'],\n        ['Алтуфьево', '6'],\n        ['Аннино', '7'],\n        ['Арбатская', '8'],\n        ['Аэропорт', '9'],\n        ['Бабушкинская', '10'],\n        ['Багратионовская', '11'],\n        ['Баррикадная', '12'],\n        ['Бауманская', '13'],\n        ['Беговая', '14'],\n        ['Белорусская', '15'],\n        ['Беляево', '16'],\n        ['Бибирево', '17'],\n        ['Библиотека им. Ленина', '18'],\n        ['Новоясеневская', '19'],\n        ['Боровицкая', '20'],\n        ['Ботанический сад', '21'],\n        ['Братиславская', '22'],\n        ['Бульвар Адмирала Ушакова', '23'],\n        ['Бульвар Дмитрия Донского', '24'],\n        ['Бунинская аллея', '25'],\n        ['Варшавская', '26'],\n        ['ВДНХ', '27'],\n        ['Владыкино', '28'],\n        ['Водный стадион', '29'],\n        ['Войковская', '30'],\n        ['Волгоградский проспект', '31'],\n        ['Волжская', '32'],\n        ['Воробьёвы горы', '33'],\n        ['Выхино', '34'],\n        ['Выставочная', '35'],\n        ['Динамо', '36'],\n        ['Дмитровская', '37'],\n        ['Добрынинская', '38'],\n        ['Домодедовская', '39'],\n        ['Дубровка', '40'],\n        ['Измайловская', '41'],\n        ['Калужская', '42'],\n        ['Кантемировская', '43'],\n        ['Каховская', '44'],\n        ['Каширская', '45'],\n        ['Киевская', '46'],\n        ['Китай-город', '47'],\n        ['Кожуховская', '48'],\n        ['Коломенская', '49'],\n        ['Комсомольская', '50'],\n        ['Коньково', '51'],\n        ['Красногвардейская', '52'],\n        ['Красносельская', '53'],\n        ['Красные ворота', '54'],\n        ['Крестьянская застава', '55'],\n        ['Кропоткинская', '56'],\n        ['Крылатское', '57'],\n        ['Кузнецкий мост', '58'],\n        ['Кузьминки', '59'],\n        ['Кунцевская', '60'],\n        ['Курская', '61'],\n        ['Кутузовская', '62'],\n        ['Ленинский проспект', '63'],\n        ['Лубянка', '64'],\n        ['Люблино', '65'],\n        ['Марксистская', '66'],\n        ['Марьино', '67'],\n        ['Маяковская', '68'],\n        ['Медведково', '69'],\n        ['Международная', '70'],\n        ['Менделеевская', '71'],\n        ['Молодёжная', '72'],\n        ['Нагатинская', '73'],\n        ['Нагорная', '74'],\n        ['Нахимовский проспект', '75'],\n        ['Новогиреево', '76'],\n        ['Новокузнецкая', '77'],\n        ['Новослободская', '78'],\n        ['Новые Черёмушки', '79'],\n        ['Октябрьская', '80'],\n        ['Октябрьское поле', '81'],\n        ['Орехово', '82'],\n        ['Отрадное', '83'],\n        ['Охотный ряд', '84'],\n        ['Павелецкая', '85'],\n        ['Парк Культуры', '86'],\n        ['Парк Победы', '87'],\n        ['Партизанская', '88'],\n        ['Первомайская', '89'],\n        ['Перово', '90'],\n        ['Петровско-Разумовская', '91'],\n        ['Печатники', '92'],\n        ['Пионерская', '93'],\n        ['Планерная', '94'],\n        ['Площадь Ильича', '95'],\n        ['Площадь Революции', '96'],\n        ['Полежаевская', '97'],\n        ['Полянка', '98'],\n        ['Пражская', '99'],\n        ['Преображенская площадь', '100'],\n        ['Пролетарская', '101'],\n        ['Проспект Вернадского', '102'],\n        ['Проспект Мира', '103'],\n        ['Профсоюзная', '104'],\n        ['Пушкинская', '105'],\n        ['Речной вокзал', '106'],\n        ['Рижская', '107'],\n        ['Римская', '108'],\n        ['Рязанский проспект', '109'],\n        ['Савёловская', '110'],\n        ['Свиблово', '111'],\n        ['Севастопольская', '112'],\n        ['Семёновская', '113'],\n        ['Серпуховская', '114'],\n        ['Смоленская', '115'],\n        ['Сокол', '116'],\n        ['Сокольники', '117'],\n        ['Спортивная', '118'],\n        ['Сретенский бульвар', '119'],\n        ['Студенческая', '120'],\n        ['Сухаревская', '121'],\n        ['Сходненская', '122'],\n        ['Таганская', '123'],\n        ['Тверская', '124'],\n        ['Театральная', '125'],\n        ['Текстильщики', '126'],\n        ['Тёплый Стан', '127'],\n        ['Тимирязевская', '128'],\n        ['Третьяковская', '129'],\n        ['Трубная', '130'],\n        ['Тульская', '131'],\n        ['Тургеневская', '132'],\n        ['Тушинская', '133'],\n        ['Улица 1905 года', '134'],\n        ['Улица Академика Янгеля', '135'],\n        ['Улица Горчакова', '136'],\n        ['Бульвар Рокоссовского', '137'],\n        ['Улица Скобелевская', '138'],\n        ['Улица Старокачаловская', '139'],\n        ['Университет', '140'],\n        ['Филёвский парк', '141'],\n        ['Фили', '142'],\n        ['Фрунзенская', '143'],\n        ['Царицыно', '144'],\n        ['Цветной бульвар', '145'],\n        ['Черкизовская', '146'],\n        ['Чертановская', '147'],\n        ['Чеховская', '148'],\n        ['Чистые пруды', '149'],\n        ['Чкаловская', '150'],\n        ['Шаболовская', '151'],\n        ['Шоссе Энтузиастов', '152'],\n        ['Щёлковская', '153'],\n        ['Щукинская', '154'],\n        ['Электрозаводская', '155'],\n        ['Юго-Западная', '156'],\n        ['Южная', '157'],\n        ['Ясенево', '158'],\n        ['Краснопресненская', '159'],\n        ['Строгино', '228'],\n        ['Славянский бульвар', '229'],\n        ['Мякинино', '233'],\n        ['Волоколамская', '234'],\n        ['Митино', '235'],\n        ['Марьина Роща', '236'],\n        ['Шипиловская', '238'],\n        ['Зябликово', '239'],\n        ['Борисово', '240'],\n        ['Новокосино', '243'],\n        ['Пятницкое шоссе', '244'],\n        ['Алма-Атинская', '245'],\n        ['Жулебино', '270'],\n        ['Лермонтовский проспект', '271'],\n        ['Деловой центр', '272'],\n        ['Лесопарковая', '273'],\n        ['Битцевский парк', '274'],\n        ['Спартак', '275'],\n        ['Улица Сергея Эйзенштейна', '276'],\n        ['Выставочный центр', '277'],\n        ['Улица Академика Королёва', '278'],\n        ['Телецентр', '279'],\n        ['Улица Милашенкова', '280'],\n        ['Тропарёво', '281'],\n        ['Котельники', '282'],\n        ['Технопарк', '283'],\n        ['Румянцево', '284'],\n        ['Саларьево', '285'],\n        ['Фонвизинская', '286'],\n        ['Бутырская', '287'],\n        ['Хорошёво', '289'],\n        ['Зорге', '290'],\n        ['Панфиловская', '291'],\n        ['Стрешнево', '292'],\n        ['Балтийская', '293'],\n        ['Коптево', '294'],\n        ['Лихоборы', '295'],\n        ['Окружная', '296'],\n        ['Ростокино', '297'],\n        ['Белокаменная', '298'],\n        ['Локомотив', '299'],\n        ['Измайлово', '300'],\n        ['Соколиная гора', '301'],\n        ['Андроновка', '302'],\n        ['Нижегородская', '303'],\n        ['Новохохловская', '304'],\n        ['Угрешская', '305'],\n        ['ЗИЛ', '306'],\n        ['Верхние котлы', '307'],\n        ['Крымская', '308'],\n        ['Площадь Гагарина', '309'],\n        ['Лужники', '310'],\n        ['Шелепиха', '311'],\n        ['Минская', '337'],\n        ['Ломоносовский проспект', '338'],\n        ['Раменки', '339'],\n        ['Ховрино', '349'],\n        ['Петровский Парк', '350'],\n        ['Хорошёвская', '351'],\n        ['ЦСКА', '352'],\n        ['Верхние Лихоборы', '353'],\n        ['Селигерская', '354'],\n        ['Мичуринский проспект', '361'],\n        ['Озёрная', '362'],\n        ['Говорово', '363'],\n        ['Солнцево', '364'],\n        ['Боровское шоссе', '365'],\n        ['Новопеределкино', '366'],\n        ['Рассказовка', '367'],\n        ['Беломорская', '369'],\n        ['Косино', '370'],\n        ['Улица Дмитриевского', '371'],\n        ['Лухмановская', '372'],\n        ['Некрасовка', '373'],\n        ['Юго-Восточная', '374'],\n        ['Окская', '375'],\n        ['Стахановская', '376'],\n        ['Филатов Луг', '377'],\n        ['Прокшино', '378'],\n        ['Ольховая', '379'],\n        ['Коммунарка', '380'],\n        ['Лефортово', '381'],\n        ['Шереметьевская', '383'],\n        ['Рижская', '384'],\n        ['Сокольники', '385'],\n        ['Электрозаводская', '386'],\n        ['Кленовый бульвар', '387'],\n        ['Нагатинский Затон', '388'],\n        ['Зюзино', '389'],\n        ['Воронцовская', '390'],\n        ['Новаторская', '391'],\n        ['Аминьевская', '392'],\n        ['Давыдково', '393'],\n        ['Кунцевская', '394'],\n        ['Мнёвники', '395'],\n        ['Терехово ', '396'],\n        ['Карамышевская', '397'],\n        ['Яхромская', '398'],\n        ['Лианозово', '399'],\n        ['Тестовская', '400'],\n        ['Рабочий посёлок', '401'],\n        ['Сетунь', '402'],\n        ['Немчиновка', '403'],\n        ['Сколково', '404'],\n        ['Баковка', '405'],\n        ['Одинцово', '406'],\n        ['Лобня', '407'],\n        ['Хлебниково', '408'],\n        ['Водники', '409'],\n        ['Долгопрудная', '410'],\n        ['Новодачная', '411'],\n        ['Марк', '412'],\n        ['Бескудниково', '413'],\n        ['Дегунино', '414'],\n        ['Нахабино', '415'],\n        ['Аникеевка', '416'],\n        ['Опалиха', '417'],\n        ['Красногорская', '418'],\n        ['Павшино', '419'],\n        ['Пенягино', '420'],\n        ['Трикотажная', '421'],\n        ['Стрешнево', '422'],\n        ['Красный Балтиец', '423'],\n        ['Гражданская', '424'],\n        ['Москва-Товарная', '425'],\n        ['Калитники', '426'],\n        ['Люблино', '427'],\n        ['Депо', '428'],\n        ['Перерва', '429'],\n        ['Москворечье', '430'],\n        ['Покровское', '431'],\n        ['Красный Строитель', '432'],\n        ['Битца', '433'],\n        ['Щербинка', '434'],\n        ['Силикатная', '435'],\n        ['Подольск', '436'],\n        ['Бутово', '437'],\n        ['Остафьево', '438'],\n        ['Курьяново', '439'],\n        ['Народное Ополчение', '440'],\n        ['Площадь трёх вокзалов', '441'],\n        ['Авиамоторная', '443'],\n        ['Деловой центр', '444'],\n        ['Каширская', '445'],\n        ['Лефортово', '446'],\n        ['Мичуринский проспект', '447'],\n        ['Нижегородская', '448'],\n        ['Печатники', '449'],\n        ['Проспект Вернадского', '450'],\n        ['Савёловская', '451'],\n        ['Текстильщики', '452'],\n        ['Шелепиха', '453'],\n        ['Марьина Роща', '454'],\n        ['Зеленоград — Крюково', '455'],\n        ['Фирсановская', '456'],\n        ['Сходня', '457'],\n        ['Подрезково', '458'],\n        ['Новоподрезково', '459'],\n        ['Молжаниново', '460'],\n        ['Химки', '461'],\n        ['Левобережная', '462'],\n        ['Ховрино', '463'],\n        ['Грачёвская', '464'],\n        ['Моссельмаш', '465'],\n        ['Лихоборы', '466'],\n        ['Петровско-Разумовская', '467'],\n        ['Останкино', '468'],\n        ['Электрозаводская', '470'],\n        ['Сортировочная', '471'],\n        ['Андроновка', '473'],\n        ['Перово', '474'],\n        ['Плющево', '475'],\n        ['Вешняки', '476'],\n        ['Выхино', '477'],\n        ['Рязанский проспект', '478'],\n        ['Ухтомская', '479'],\n        ['Люберцы', '480'],\n        ['Панки', '481'],\n        ['Томилино', '482'],\n        ['Красково', '483'],\n        ['Котельники', '484'],\n        ['Отдых', '488'],\n        ['Кратово', '489'],\n        ['Есенинская', '490'],\n        ['Фабричная', '491'],\n        ['Раменское', '492'],\n        ['Ипподром', '493'],\n        ['Апрелевка', '494'],\n        ['Победа', '495'],\n        ['Крёкшино', '496'],\n        ['Санино', '497'],\n        ['Кокошкино', '498'],\n        ['Толстопальцево', '499'],\n        ['Лесной Городок', '500'],\n        ['Внуково', '501'],\n        ['Мичуринец', '502'],\n        ['Переделкино', '503'],\n        ['Солнечная', '504'],\n        ['Говорово', '505'],\n        ['Очаково', '506'],\n        ['Аминьевская', '507'],\n        ['Матвеевская', '508'],\n        ['Минская', '509'],\n        ['Кутузовская', '511'],\n        ['Беговая', '513'],\n        ['Белорусская', '514'],\n        ['Рижская', '517'],\n        ['Курская', '519'],\n        ['Чухлинка', '522'],\n        ['Кусково', '523'],\n        ['Новогиреево', '524'],\n        ['Реутов', '525'],\n        ['Никольское', '526'],\n        ['Салтыковская', '527'],\n        ['Кучино', '528'],\n        ['Ольгино', '529'],\n        ['Железнодорожная', '530'],\n        ['Физтех', '533'],\n        ['Аэропорт Внуково', '535'],\n        ['Пыхтино', '536'],\n        ['Марьина Роща', '537'],\n    ],\n    \"Казанский\": [\n        ['Северный Вокзал', '314'],\n        ['Яшьлек', '315'],\n        ['Козья слобода', '316'],\n        ['Кремлёвская', '317'],\n        ['Площадь Тукая', '318'],\n        ['Суконная слобода', '319'],\n        ['Аметьево', '320'],\n        ['Горки', '321'],\n        ['Проспект Победы', '322'],\n        ['Дубравная', '368'],\n    ],\n    \"Петербургский\": [\n        ['Девяткино', '167'],\n        ['Гражданский проспект', '168'],\n        ['Академическая', '169'],\n        ['Политехническая', '170'],\n        ['Площадь Мужества', '171'],\n        ['Лесная', '172'],\n        ['Выборгская', '173'],\n        ['Площадь Ленина', '174'],\n        ['Чернышевская', '175'],\n        ['Площадь Восстания', '176'],\n        ['Владимирская', '177'],\n        ['Пушкинская', '178'],\n        ['Технологический институт', '179'],\n        ['Балтийская', '180'],\n        ['Нарвская', '181'],\n        ['Кировский завод', '182'],\n        ['Автово', '183'],\n        ['Ленинский проспект', '184'],\n        ['Проспект Ветеранов', '185'],\n        ['Парнас', '186'],\n        ['Проспект Просвещения', '187'],\n        ['Озерки', '188'],\n        ['Удельная', '189'],\n        ['Пионерская', '190'],\n        ['Черная речка', '191'],\n        ['Петроградская', '192'],\n        ['Горьковская', '193'],\n        ['Невский проспект', '194'],\n        ['Сенная площадь', '195'],\n        ['Фрунзенская', '197'],\n        ['Московские ворота', '198'],\n        ['Электросила', '199'],\n        ['Парк Победы', '200'],\n        ['Московская', '201'],\n        ['Звездная', '202'],\n        ['Купчино', '203'],\n        ['Приморская', '204'],\n        ['Василеостровская', '205'],\n        ['Гостиный двор', '206'],\n        ['Маяковская', '207'],\n        ['Площадь Александра Невского', '208'],\n        ['Елизаровская', '210'],\n        ['Ломоносовская', '211'],\n        ['Пролетарская', '212'],\n        ['Обухово', '213'],\n        ['Рыбацкое', '214'],\n        ['Комендантский проспект', '215'],\n        ['Старая Деревня', '216'],\n        ['Крестовский остров', '217'],\n        ['Чкаловская', '218'],\n        ['Спортивная', '219'],\n        ['Садовая', '220'],\n        ['Достоевская', '221'],\n        ['Лиговский проспект', '222'],\n        ['Новочеркасская', '224'],\n        ['Ладожская', '225'],\n        ['Проспект Большевиков', '226'],\n        ['Улица Дыбенко', '227'],\n        ['Волковская', '230'],\n        ['Звенигородская', '231'],\n        ['Спасская', '232'],\n        ['Обводный канал', '241'],\n        ['Адмиралтейская', '242'],\n        ['Международная', '246'],\n        ['Бухарестская', '247'],\n        ['Проспект Славы', '357'],\n        ['Беговая', '355'],\n        ['Зенит', '356'],\n        ['Проспект Славы', '357'],\n        ['Дунайская', '358'],\n        ['Шушары', '359'],\n        ['Горный институт', '382'],\n    ],\n    \"Самарский\": [\n        ['Российская', '261'],\n        ['Московская', '262'],\n        ['Гагаринская', '263'],\n        ['Спортивная', '264'],\n        ['Советская', '265'],\n        ['Победа', '266'],\n        ['Безымянка', '267'],\n        ['Кировская', '268'],\n        ['Юнгородок', '269'],\n        ['Победа', '270'],\n        ['Алабинская', '312'],\n    ],\n    \"Екатеринбургский\": [\n        ['Проспект Космонавтов', '340'],\n        ['Уралмаш', '341'],\n        ['Машиностроителей', '342'],\n        ['Уральская', '343'],\n        ['Динамо', '343'],\n        ['Площадь 1905 года', '345'],\n        ['Геологическая', '346'],\n        ['Чкаловская', '347'],\n        ['Ботаническая', '348'],\n    ],\n    \"Новосибирский\": [\n        ['Заельцовская', '248'],\n        ['Гагаринская', '249'],\n        ['Красный Проспект', '250'],\n        ['Сибирская', '251'],\n        ['Площадь Ленина', '252'],\n        ['Октябрьская', '253'],\n        ['Речной Вокзал', '254'],\n        ['Студенческая', '255'],\n        ['Площадь Маркса', '256'],\n        ['Площадь Гарина-Михайловского', '257'],\n        ['Маршала Покрышкина', '258'],\n        ['Березовая Роща', '259'],\n        ['Золотая Нива', '260'],\n    ],\n    \"Нижегородский\": [\n        ['Горьковская', '323'],\n        ['Московская', '324'],\n        ['Чкаловская', '325'],\n        ['Ленинская', '326'],\n        ['Заречная', '327'],\n        ['Двигатель Революции', '328'],\n        ['Пролетарская', '329'],\n        ['Автозаводская', '330'],\n        ['Комсомольская', '331'],\n        ['Кировская', '332'],\n        ['Парк культуры', '333'],\n        ['Канавинская', '334'],\n        ['Бурнаковская', '335'],\n        ['Буревестник', '335'],\n        ['Стрелка', '360']\n    ],\n}\n"
  },
  {
    "path": "cianparser/definers/__init__.py",
    "content": ""
  },
  {
    "path": "cianparser/definers/definer_cities_id.py",
    "content": "import time\nimport requests\nfrom bs4 import BeautifulSoup\nimport pymorphy2\nimport collections\nimport csv\nimport cloudscraper\n\nParseCityNames = collections.namedtuple(\n    'ParseResults',\n    {\n        'location_name',\n        'city_id',\n    }\n)\n\n\nclass Client:\n    def __init__(self, start_location_id=1, end_location_id=20):\n        self.session = cloudscraper.create_scraper()\n        self.session.headers = {'Accept-Language': 'en'}\n\n        self.cities = []\n        self.cities_set = set()\n\n        self.start_location_id = start_location_id\n        self.end_location_id = end_location_id\n\n    def define_city(self, html, location_id: int):\n        soup = BeautifulSoup(html, 'html.parser')\n        offers = soup.select(\"div[data-name='HeaderDefault']\")\n\n        if len(offers) == 0:\n            print(\"_\" + \"  \" + \"***\")\n            return self.cities\n\n        title = offers[0].text\n        city = title.lower()[title.lower().find(\"снять квартиру в \") + len(\"снять квартиру в \"):title.lower().find(\n            \" на длительный срок\")]\n\n        if (\"в России\" in title or \"АрендаСнять\" not in title or\n                (\"области\" in city or \"крае\" in city or \"республике\" in city or\n                 \"округе\" in city or \"россии\" in city or\n                 \"кабардино\" in city or \"карачаево\" in city or\n                 \"дагестан\" in city or \"осетии\" in city or\n                 \"ненецком ао\" in city or \"ямало-ненецком ао\" in city or\n                 \"чукотском ао\" in city or \"ханты-мансийском ао\" in city or\n                 \"чувашии\" in city)\n        ):\n            print(\"_\" + \"  \" + str(location_id))\n            return self.cities\n\n        morph = pymorphy2.MorphAnalyzer()\n        city = morph.parse(city)[0].normal_form.title()\n        print(city + \" \" + str(location_id))\n\n        if city not in self.cities_set:\n            self.cities_set.add(city)\n            self.cities.append((city, location_id))\n            self.save_results()\n\n        return self.cities\n\n    def define_all_cities(self):\n        for location_id in range(self.start_location_id, self.end_location_id+1):\n            path = f'https://www.cian.ru/cat.php?deal_type=rent&engine_version=2&offer_type=flat&p=1&region={location_id}&type=4'\n            response = requests.get(path)\n            html = response.text\n            self.define_city(html, location_id)\n            time.sleep(2)\n\n        self.cities = sorted(self.cities, key=lambda x: x[0])\n\n    def save_results(self):\n        cities_result = []\n        cities_result.append(ParseCityNames(\n            location_name='location_name',\n            city_id='city_id',\n        ))\n\n        for city_couple in self.cities:\n            cities_result.append(ParseCityNames(\n                location_name=city_couple[0],\n                city_id=city_couple[1],\n            ))\n\n        path = f\"cities_{self.start_location_id}_{self.end_location_id}.csv\"\n        with open(path, \"w\") as f:\n            writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)\n            for item in self.cities:\n                writer.writerow(item)\n\n\nif __name__ == '__main__':\n    definer = Client(start_location_id=6000, end_location_id=7000)\n    definer.define_all_cities()\n"
  },
  {
    "path": "cianparser/definers/definer_metro_id.py",
    "content": "import time\nimport requests\nfrom bs4 import BeautifulSoup\nimport collections\nimport csv\nimport cloudscraper\n\nParseMetroNames = collections.namedtuple(\n    'ParseResults',\n    {\n        'city',\n        'metro_name',\n        'metro_id',\n    }\n)\n\n\nclass Client:\n    def __init__(self, start_metro_id=1, end_metro_id=20):\n        self.session = cloudscraper.create_scraper()\n        self.session.headers = {'Accept-Language': 'en'}\n\n        self.metro_stations = []\n        self.metro_set = set()\n\n        self.start_metro_id = start_metro_id\n        self.end_metro_id = end_metro_id\n\n    def define_metro(self, html, metro_id: int):\n        soup = BeautifulSoup(html, 'html.parser')\n        offers = soup.select(\"div[data-name='GeneralInfoSectionRowComponent']\")\n\n        if len(offers) == 0:\n            print(\"_\" + \"  \" + \"***\")\n            return self.metro_stations\n\n        address = offers[1].text\n\n        if \", м.\" not in address:\n            for offer in offers:\n                if \", м.\" in offer.text:\n                    address = offer.text\n\n        if address.find(\", м.\") == 0:\n            print(\"_\" + \"  \" + \"***\" + \"somethins wrong\")\n\n        city = \"Unknown\"\n        if \"Москва\" in address:\n            city = \"Москва\"\n        if \"Казань\" in address:\n            city = \"Казань\"\n        if \"Санкт-Петербург\" in address:\n            city = \"Санкт-Петербург\"\n        if \"Самара\" in address:\n            city = \"Самара\"\n        if \"Екатеринбург\" in address:\n            city = \"Екатеринбург\"\n        if \"Новосибирск\" in address:\n            city = \"Новосибирск\"\n        if \"Нижний Новгород\" in address:\n            city = \"Нижний Новгород\"\n\n        metro = address[address.find(\", м.\") + len(\", м. \"):].split(\", \")[0]\n        print(f\"{city}, {metro}, {str(metro_id)}\")\n\n        if metro not in self.metro_set:\n            self.metro_set.add(metro)\n            self.metro_stations.append((city, metro, metro_id))\n            self.save_results()\n\n        return self.metro_stations\n\n    def define_all_metro_stations(self):\n        for metro_id in range(self.start_metro_id, self.end_metro_id+1):\n            path = f'https://www.cian.ru/cat.php?deal_type=rent&engine_version=2&offer_type=flat&p=1&region=1&type=4&metro[0]={metro_id}'\n            response = requests.get(path)\n            html = response.text\n            self.define_metro(html, metro_id)\n            time.sleep(2)\n\n        self.metro_stations = sorted(self.metro_stations, key=lambda x: x[0])\n\n    def save_results(self):\n        metro_stations_result = [ParseMetroNames(\n            city='city',\n            metro_name='metro_name',\n            metro_id='metro_id',\n        )]\n\n        for metro_couple in self.metro_stations:\n            metro_stations_result.append(ParseMetroNames(\n                city=metro_couple[0],\n                metro_name=metro_couple[1],\n                metro_id=metro_couple[2],\n            ))\n\n        path = f\"metro_stations_{self.start_metro_id}_{self.end_metro_id}.csv\"\n        with open(path, \"w\") as f:\n            writer = csv.writer(f, quoting=csv.QUOTE_MINIMAL)\n            for item in self.metro_stations:\n                writer.writerow(item)\n\n\nif __name__ == '__main__':\n    definer = Client(start_metro_id=1, end_metro_id=10)\n    definer.define_all_metro_stations()\n"
  },
  {
    "path": "cianparser/flat/list.py",
    "content": "import bs4\nimport time\nimport pathlib\nfrom datetime import datetime\nfrom transliterate import translit\n\nfrom cianparser.constants import FILE_NAME_FLAT_FORMAT\nfrom cianparser.helpers import union_dicts, define_author, define_location_data, define_specification_data, define_deal_url_id, define_price_data\nfrom cianparser.flat.page import FlatPageParser\nfrom cianparser.base_list import BaseListPageParser\n\n\nclass FlatListPageParser(BaseListPageParser):\n    def build_file_path(self):\n        now_time = datetime.now().strftime(\"%d_%b_%Y_%H_%M_%S_%f\")\n        file_name = FILE_NAME_FLAT_FORMAT.format(self.accommodation_type, self.deal_type, self.start_page, self.end_page, translit(self.location_name.lower(), reversed=True), now_time)\n        return pathlib.Path(pathlib.Path.cwd(), file_name.replace(\"'\", \"\"))\n\n    def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):\n        list_soup = bs4.BeautifulSoup(html, 'html.parser')\n\n        if list_soup.text.find(\"Captcha\") > 0:\n            print(f\"\\r{page_number} page: there is CAPTCHA... failed to parse page...\")\n            return False, attempt_number + 1, True\n\n        header = list_soup.select(\"div[data-name='HeaderDefault']\")\n        if len(header) == 0:\n            return False, attempt_number + 1, False\n\n        offers = list_soup.select(\"article[data-name='CardComponent']\")\n        print(\"\")\n        print(f\"\\r {page_number} page: {len(offers)} offers\", end=\"\\r\", flush=True)\n\n        if page_number == self.start_page and attempt_number == 0:\n            print(f\"Collecting information from pages with list of offers\", end=\"\\n\")\n\n        for ind, offer in enumerate(offers):\n            self.parse_offer(offer=offer)\n            self.print_parse_progress(page_number=page_number, count_of_pages=count_of_pages, offers=offers, ind=ind)\n\n        time.sleep(2)\n\n        return True, 0, False\n\n    def parse_offer(self, offer):\n        common_data = dict()\n        common_data[\"url\"] = offer.select(\"div[data-name='LinkArea']\")[0].select(\"a\")[0].get('href')\n        common_data[\"location\"] = self.location_name\n        common_data[\"deal_type\"] = self.deal_type\n        common_data[\"accommodation_type\"] = self.accommodation_type\n\n        author_data = define_author(block=offer)\n        location_data = define_location_data(block=offer, is_sale=self.is_sale())\n        price_data = define_price_data(block=offer)\n        specification_data = define_specification_data(block=offer)\n\n        if define_deal_url_id(common_data[\"url\"]) in self.result_set:\n            return\n\n        page_data = dict()\n        if self.with_extra_data:\n            flat_parser = FlatPageParser(session=self.session, url=common_data[\"url\"])\n            page_data = flat_parser.parse_page()\n            time.sleep(4)\n\n        self.count_parsed_offers += 1\n        self.define_average_price(price_data=price_data)\n        self.result_set.add(define_deal_url_id(common_data[\"url\"]))\n        self.result.append(union_dicts(author_data, common_data, specification_data, price_data, page_data, location_data))\n\n        if self.with_saving_csv:\n            self.save_results()\n"
  },
  {
    "path": "cianparser/flat/page.py",
    "content": "import bs4\nimport re\nimport time\n\n\nclass FlatPageParser:\n    def __init__(self, session, url):\n        self.session = session\n        self.url = url\n\n    def __load_page__(self):\n        res = self.session.get(self.url)\n        if res.status_code == 429:\n            time.sleep(10)\n        res.raise_for_status()\n        self.offer_page_html = res.text\n        self.offer_page_soup = bs4.BeautifulSoup(self.offer_page_html, 'html.parser')\n\n    def __parse_flat_offer_page_json__(self):\n        page_data = {\n            \"year_of_construction\": -1,\n            \"object_type\": -1,\n            \"house_material_type\": -1,\n            \"heating_type\": -1,\n            \"finish_type\": -1,\n            \"living_meters\": -1,\n            \"kitchen_meters\": -1,\n            \"floor\": -1,\n            \"floors_count\": -1,\n            \"phone\": \"\",\n        }\n\n        spans = self.offer_page_soup.select(\"span\")\n        for index, span in enumerate(spans):\n            if \"Тип жилья\" == span.text:\n                page_data[\"object_type\"] = spans[index + 1].text\n\n            if \"Тип дома\" == span.text:\n                page_data[\"house_material_type\"] = spans[index + 1].text\n\n            if \"Отопление\" == span.text:\n                page_data[\"heating_type\"] = spans[index + 1].text\n\n            if \"Отделка\" == span.text:\n                page_data[\"finish_type\"] = spans[index + 1].text\n\n            if \"Площадь кухни\" == span.text:\n                page_data[\"kitchen_meters\"] = spans[index + 1].text\n\n            if \"Жилая площадь\" == span.text:\n                page_data[\"living_meters\"] = spans[index + 1].text\n\n            if \"Год постройки\" in span.text:\n                page_data[\"year_of_construction\"] = spans[index + 1].text\n\n            if \"Год сдачи\" in span.text:\n                page_data[\"year_of_construction\"] = spans[index + 1].text\n\n            if \"Этаж\" == span.text:\n                ints = re.findall(r'\\d+', spans[index + 1].text)\n                if len(ints) == 2:\n                    page_data[\"floor\"] = int(ints[0])\n                    page_data[\"floors_count\"] = int(ints[1])\n\n        if \"+7\" in self.offer_page_html:\n            page_data[\"phone\"] = self.offer_page_html[self.offer_page_html.find(\"+7\"): self.offer_page_html.find(\"+7\") + 16].split('\"')[0]. \\\n                replace(\" \", \"\"). \\\n                replace(\"-\", \"\")\n\n        return page_data\n\n    def parse_page(self):\n        self.__load_page__()\n        return self.__parse_flat_offer_page_json__()\n"
  },
  {
    "path": "cianparser/helpers.py",
    "content": "import re\nimport itertools\nfrom cianparser.constants import STREET_TYPES, NOT_STREET_ADDRESS_ELEMENTS, FLOATS_NUMBERS_REG_EXPRESSION\n\n\ndef union_dicts(*dicts):\n    return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))\n\n\ndef define_rooms_count(description):\n    if \"1-комн\" in description or \"Студия\" in description:\n        rooms_count = 1\n    elif \"2-комн\" in description:\n        rooms_count = 2\n    elif \"3-комн\" in description:\n        rooms_count = 3\n    elif \"4-комн\" in description:\n        rooms_count = 4\n    elif \"5-комн\" in description:\n        rooms_count = 5\n    else:\n        rooms_count = -1\n\n    return rooms_count\n\n\ndef define_deal_url_id(url: str):\n    url_path_elements = url.split(\"/\")\n    if len(url_path_elements[-1]) > 3:\n        return url_path_elements[-1]\n    if len(url_path_elements[-2]) > 3:\n        return url_path_elements[-2]\n\n    return \"-1\"\n\n\ndef define_author(block):\n    spans = block.select(\"div\")[0].select(\"span\")\n\n    author_data = {\n        \"author\": \"\",\n        \"author_type\": \"\",\n    }\n\n    for index, span in enumerate(spans):\n        if \"Агентство недвижимости\" in span:\n            author_data[\"author\"] = spans[index + 1].text.replace(\",\", \".\").strip()\n            author_data[\"author_type\"] = \"real_estate_agent\"\n            return author_data\n\n    for index, span in enumerate(spans):\n        if \"Собственник\" in span:\n            author_data[\"author\"] = spans[index + 1].text\n            author_data[\"author_type\"] = \"homeowner\"\n            return author_data\n\n    for index, span in enumerate(spans):\n        if \"Риелтор\" in span:\n            author_data[\"author\"] = spans[index + 1].text\n            author_data[\"author_type\"] = \"realtor\"\n            return author_data\n\n    for index, span in enumerate(spans):\n        if \"Ук・оф.Представитель\" in span:\n            author_data[\"author\"] = spans[index + 1].text\n            author_data[\"author_type\"] = \"official_representative\"\n            return author_data\n\n    for index, span in enumerate(spans):\n        if \"Представитель застройщика\" in span:\n            author_data[\"author\"] = spans[index + 1].text\n            author_data[\"author_type\"] = \"representative_developer\"\n            return author_data\n\n    for index, span in enumerate(spans):\n        if \"Застройщик\" in span:\n            author_data[\"author\"] = spans[index + 1].text\n            author_data[\"author_type\"] = \"developer\"\n            return author_data\n\n    for index, span in enumerate(spans):\n        if \"ID\" in span.text:\n            author_data[\"author\"] = span.text\n            author_data[\"author_type\"] = \"unknown\"\n            return author_data\n\n    return author_data\n\n\ndef parse_location_data(block):\n    general_info_sections = block.select_one(\"div[data-name='LinkArea']\").select(\"div[data-name='GeneralInfoSectionRowComponent']\")\n\n    location_data = dict()\n    location_data[\"district\"] = \"\"\n    location_data[\"underground\"] = \"\"\n    location_data[\"street\"] = \"\"\n    location_data[\"house_number\"] = \"\"\n\n    for section in general_info_sections:\n        geo_labels = section.select(\"a[data-name='GeoLabel']\")\n\n        # if len(geo_labels) > 1:\n            # print(\"\\n\\n\", location_data[\"street\"] == \"\",geo_labels[-2].text, \"|||\", geo_labels[-1].text)\n\n        for index, label in enumerate(geo_labels):\n            if \"м. \" in label.text:\n                location_data[\"underground\"] = label.text\n\n            if \"р-н\" in label.text or \"поселение\" in label.text:\n                location_data[\"district\"] = label.text\n\n            if any(street_type in label.text.lower() for street_type in STREET_TYPES):\n                location_data[\"street\"] = label.text\n\n                if len(geo_labels) > index + 1 and any(chr.isdigit() for chr in geo_labels[index + 1].text):\n                    location_data[\"house_number\"] = geo_labels[index + 1].text\n\n    return location_data\n\n\ndef define_location_data(block, is_sale):\n    elements = block.select_one(\"div[data-name='LinkArea']\").select(\"div[data-name='GeneralInfoSectionRowComponent']\")\n\n    location_data = dict()\n    location_data[\"district\"] = \"\"\n    location_data[\"street\"] = \"\"\n    location_data[\"house_number\"] = \"\"\n    location_data[\"underground\"] = \"\"\n\n    if is_sale:\n        location_data[\"residential_complex\"] = \"\"\n\n    for index, element in enumerate(elements):\n        if (\"ЖК\" in element.text) and (\"«\" in element.text) and (\"»\" in element.text):\n            location_data[\"residential_complex\"] = element.text.split(\"«\")[1].split(\"»\")[0]\n\n        if \"р-н\" in element.text and len(element.text) < 250:\n            address_elements = element.text.split(\",\")\n            if len(address_elements) < 2:\n                continue\n\n            if \"ЖК\" in address_elements[0] and \"«\" in address_elements[0] and \"»\" in address_elements[0]:\n                location_data[\"residential_complex\"] = address_elements[0].split(\"«\")[1].split(\"»\")[0]\n\n            if \", м. \" in element.text:\n                location_data[\"underground\"] = element.text.split(\", м. \")[1]\n                if \",\" in location_data[\"underground\"]:\n                    location_data[\"underground\"] = location_data[\"underground\"].split(\",\")[0]\n\n            if (any(chr.isdigit() for chr in address_elements[-1]) and \"жк\" not in address_elements[-1].lower() and\n                not any(street_type in address_elements[-1].lower() for street_type in STREET_TYPES)) and len(\n                address_elements[-1]) < 10:\n                location_data[\"house_number\"] = address_elements[-1].strip()\n\n            for ind, elem in enumerate(address_elements):\n                if \"р-н\" in elem:\n                    district = elem.replace(\"р-н\", \"\").strip()\n\n                    location_data[\"district\"] = district\n\n                    if \"ЖК\" in address_elements[-1]:\n                        location_data[\"residential_complex\"] = address_elements[-1].strip()\n\n                    if \"ЖК\" in address_elements[-2]:\n                        location_data[\"residential_complex\"] = address_elements[-2].strip()\n\n                    for street_type in STREET_TYPES:\n                        if street_type in address_elements[-1]:\n                            location_data[\"street\"] = address_elements[-1].strip()\n                            if street_type == \"улица\":\n                                location_data[\"street\"] = location_data[\"street\"].replace(\"улица\", \"\")\n                            return location_data\n\n                        if street_type in address_elements[-2]:\n                            location_data[\"street\"] = address_elements[-2].strip()\n                            if street_type == \"улица\":\n                                location_data[\"street\"] = location_data[\"street\"].replace(\"улица\", \"\")\n\n                            return location_data\n\n                    for k, after_district_address_element in enumerate(address_elements[ind + 1:]):\n                        if len(list(set(after_district_address_element.split(\" \")).intersection(\n                                NOT_STREET_ADDRESS_ELEMENTS))) != 0:\n                            continue\n\n                        if len(after_district_address_element.strip().replace(\" \", \"\")) < 4:\n                            continue\n\n                        location_data[\"street\"] = after_district_address_element.strip()\n\n                        return location_data\n\n            return location_data\n\n    if location_data[\"district\"] == \"\":\n        for index, element in enumerate(elements):\n            if \", м. \" in element.text and len(element.text) < 250:\n                location_data[\"underground\"] = element.text.split(\", м. \")[1]\n                if \",\" in location_data[\"underground\"]:\n                    location_data[\"underground\"] = location_data[\"underground\"].split(\",\")[0]\n\n                address_elements = element.text.split(\",\")\n\n                if len(address_elements) < 2:\n                    continue\n\n                if \"ЖК\" in address_elements[-1]:\n                    location_data[\"residential_complex\"] = address_elements[-1].strip()\n\n                if \"ЖК\" in address_elements[-2]:\n                    location_data[\"residential_complex\"] = address_elements[-2].strip()\n\n                if (any(chr.isdigit() for chr in address_elements[-1]) and \"жк\" not in address_elements[\n                    -1].lower() and\n                    not any(\n                        street_type in address_elements[-1].lower() for street_type in STREET_TYPES)) and len(\n                    address_elements[-1]) < 10:\n                    location_data[\"house_number\"] = address_elements[-1].strip()\n\n                for street_type in STREET_TYPES:\n                    if street_type in address_elements[-1]:\n                        location_data[\"street\"] = address_elements[-1].strip()\n                        if street_type == \"улица\":\n                            location_data[\"street\"] = location_data[\"street\"].replace(\"улица\", \"\")\n                        return location_data\n\n                    if street_type in address_elements[-2]:\n                        location_data[\"street\"] = address_elements[-2].strip()\n                        if street_type == \"улица\":\n                            location_data[\"street\"] = location_data[\"street\"].replace(\"улица\", \"\")\n                        return location_data\n\n            for street_type in STREET_TYPES:\n                if (\", \" + street_type + \" \" in element.text) or (\" \" + street_type + \", \" in element.text):\n                    address_elements = element.text.split(\",\")\n\n                    if len(address_elements) < 3:\n                        continue\n\n                    if (any(chr.isdigit() for chr in address_elements[-1]) and \"жк\" not in address_elements[\n                        -1].lower() and\n                        not any(\n                            street_type in address_elements[-1].lower() for street_type in STREET_TYPES)) and len(\n                        address_elements[-1]) < 10:\n                        location_data[\"house_number\"] = address_elements[-1].strip()\n\n                    if street_type in address_elements[-1]:\n                        location_data[\"street\"] = address_elements[-1].strip()\n                        if street_type == \"улица\":\n                            location_data[\"street\"] = location_data[\"street\"].replace(\"улица\", \"\")\n\n                        location_data[\"district\"] = address_elements[-2].strip()\n\n                        return location_data\n\n                    if street_type in address_elements[-2]:\n                        location_data[\"street\"] = address_elements[-2].strip()\n                        if street_type == \"улица\":\n                            location_data[\"street\"] = location_data[\"street\"].replace(\"улица\", \"\")\n\n                        location_data[\"district\"] = address_elements[-3].strip()\n\n                        return location_data\n\n    return location_data\n\n\ndef define_price_data(block):\n    elements = block.select(\"div[data-name='LinkArea']\")[0]. \\\n        select(\"span[data-mark='MainPrice']\")\n\n    price_data = {\n        \"price_per_month\": -1,\n        \"commissions\": 0,\n    }\n\n    for element in elements:\n        if \"₽/мес\" in element.text:\n            price_description = element.text\n            price_data[\"price_per_month\"] = int(\n                \"\".join(price_description[:price_description.find(\"₽/мес\") - 1].split()))\n\n            if \"%\" in price_description:\n                price_data[\"commissions\"] = int(\n                    price_description[price_description.find(\"%\") - 2:price_description.find(\"%\")].replace(\" \", \"\"))\n\n            return price_data\n\n        if \"₽\" in element.text and \"млн\" not in element.text:\n            price_description = element.text\n            price_data[\"price\"] = int(\"\".join(price_description[:price_description.find(\"₽\") - 1].split()))\n\n            return price_data\n\n    return price_data\n\n\ndef define_specification_data(block):\n    specification_data = dict()\n    specification_data[\"floor\"] = -1\n    specification_data[\"floors_count\"] = -1\n    specification_data[\"rooms_count\"] = -1\n    specification_data[\"total_meters\"] = -1\n\n    title = block.select(\"div[data-name='LinkArea']\")[0].select(\"div[data-name='GeneralInfoSectionRowComponent']\")[\n        0].text\n\n    common_properties = block.select(\"div[data-name='LinkArea']\")[0]. \\\n        select(\"div[data-name='GeneralInfoSectionRowComponent']\")[0].text\n\n    if common_properties.find(\"м²\") is not None:\n        total_meters = title[: common_properties.find(\"м²\")].replace(\",\", \".\")\n        if len(re.findall(FLOATS_NUMBERS_REG_EXPRESSION, total_meters)) != 0:\n            specification_data[\"total_meters\"] = float(\n                re.findall(FLOATS_NUMBERS_REG_EXPRESSION, total_meters)[-1].replace(\" \", \"\").replace(\"-\", \"\"))\n\n    if \"этаж\" in common_properties:\n        floor_per = common_properties[common_properties.rfind(\"этаж\") - 7: common_properties.rfind(\"этаж\")]\n        floor_properties = floor_per.split(\"/\")\n\n        if len(floor_properties) == 2:\n            ints = re.findall(r'\\d+', floor_properties[0])\n            if len(ints) != 0:\n                specification_data[\"floor\"] = int(ints[-1])\n\n            ints = re.findall(r'\\d+', floor_properties[1])\n            if len(ints) != 0:\n                specification_data[\"floors_count\"] = int(ints[-1])\n\n    specification_data[\"rooms_count\"] = define_rooms_count(common_properties)\n\n    return specification_data\n"
  },
  {
    "path": "cianparser/newobject/list.py",
    "content": "import bs4\nimport time\nimport math\nimport csv\nimport pathlib\nfrom datetime import datetime\nfrom transliterate import translit\nimport urllib.parse\n\nfrom cianparser.constants import FILE_NAME_NEWOBJECT_FORMAT\nfrom cianparser.helpers import union_dicts\nfrom cianparser.newobject.page import NewObjectPageParser\n\n\nclass NewObjectListParser:\n    def __init__(self, session, location_name: str, with_saving_csv=False):\n        self.accommodation_type = \"newobject\"\n        self.deal_type = \"sale\"\n        self.session = session\n        self.location_name = location_name\n        self.with_saving_csv = with_saving_csv\n\n        self.result = []\n        self.result_set = set()\n        self.average_price = 0\n        self.count_parsed_offers = 0\n        self.start_page = 1\n        self.end_page = 50\n        self.file_path = self.build_file_path()\n\n    def build_file_path(self):\n        now_time = datetime.now().strftime(\"%d_%b_%Y_%H_%M_%S_%f\")\n        file_name = FILE_NAME_NEWOBJECT_FORMAT.format(self.accommodation_type, translit(self.location_name.lower(), reversed=True), now_time)\n        return pathlib.Path(pathlib.Path.cwd(), file_name.replace(\"'\", \"\"))\n\n    def print_parse_progress(self, page_number, count_of_pages, offers, ind):\n        total_planed_offers = len(offers) * count_of_pages\n        print(f\"\\r {page_number - self.start_page + 1}\"\n              f\" | {page_number} page with list: [\" + \"=>\" * (ind + 1) + \"  \" * (len(offers) - ind - 1) + \"]\" + f\" {math.ceil((ind + 1) * 100 / len(offers))}\" + \"%\" +\n              f\" | Count of all parsed: {self.count_parsed_offers}.\"\n              f\" Progress ratio: {math.ceil(self.count_parsed_offers * 100 / total_planed_offers)} %.\",\n              end=\"\\r\", flush=True)\n\n    def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):\n        list_soup = bs4.BeautifulSoup(html, 'html.parser')\n\n        if list_soup.text.find(\"Captcha\") > 0:\n            print(f\"\\r{page_number} page: there is CAPTCHA... failed to parse page...\")\n            return False, attempt_number + 1, True\n\n        offers = list_soup.select(\"div[data-mark='GKCard']\")\n        print(\"\")\n        print(f\"\\r {page_number} page: {len(offers)} offers\", end=\"\\r\", flush=True)\n\n        if page_number == self.start_page and attempt_number == 0:\n            print(f\"Collecting information from pages with list of offers\", end=\"\\n\")\n\n        for ind, offer in enumerate(offers):\n            self.parse_offer(offer=offer)\n            self.print_parse_progress(page_number=page_number, count_of_pages=count_of_pages, offers=offers, ind=ind)\n\n        time.sleep(2)\n\n        return True, 0, False\n\n    def parse_offer(self, offer):\n        common_data = dict()\n        common_data[\"name\"] = offer.select_one(\"span[data-mark='Text']\").text\n        common_data[\"location\"] = self.location_name\n        common_data[\"accommodation_type\"] = self.accommodation_type\n        common_data[\"url\"] = \"https://\" + urllib.parse.urlparse(offer.select_one(\"a[data-mark='Link']\").get('href')).netloc\n        common_data[\"full_full_location_address\"] = offer.select_one(\"div[data-mark='CellAddressBlock']\").text\n\n        if common_data[\"url\"] in self.result_set:\n            return\n\n        flat_parser = NewObjectPageParser(session=self.session, url=common_data[\"url\"])\n        page_data = flat_parser.parse_page()\n        time.sleep(4)\n\n        self.count_parsed_offers += 1\n        self.result_set.add(common_data[\"url\"])\n        self.result.append(union_dicts(common_data, page_data))\n\n        if self.with_saving_csv:\n            self.save_results()\n\n    def save_results(self):\n        keys = self.result[0].keys()\n\n        with open(self.file_path, 'w', newline='', encoding='utf-8') as output_file:\n            dict_writer = csv.DictWriter(output_file, keys, delimiter=';')\n            dict_writer.writeheader()\n            dict_writer.writerows(self.result)\n"
  },
  {
    "path": "cianparser/newobject/page.py",
    "content": "import bs4\nimport re\nimport time\n\n\nclass NewObjectPageParser:\n    def __init__(self, session, url):\n        self.session = session\n        self.url = url\n\n    def __load_page__(self):\n        res = self.session.get(self.url)\n        if res.status_code == 429:\n            time.sleep(10)\n        res.raise_for_status()\n        self.offer_page_html = res.text\n        self.offer_page_soup = bs4.BeautifulSoup(self.offer_page_html, 'html.parser')\n\n    def parse_page(self):\n        self.__load_page__()\n\n        page_data = {\n            \"year_of_construction\": -1,\n            \"house_material_type\": -1,\n            \"finish_type\": -1,\n            \"ceiling_height\":-1,\n            \"class\": -1,\n            \"parking_type\": -1,\n            \"floors_from\": -1,\n            \"floors_to\": -1,\n        }\n\n        spans = self.offer_page_soup.select(\"span\")\n        for index, span in enumerate(spans):\n            if \"Срок сдачи\" in span.text:\n                page_data[\"year_of_construction\"] = spans[index + 1].text\n\n            if \"Тип дома\" == span.text:\n                page_data[\"house_material_type\"] = spans[index + 1].text\n\n            if \"Отделка\" == span.text:\n                page_data[\"finish_type\"] = spans[index + 1].text\n\n            if \"Высота потолков\" == span.text:\n                page_data[\"ceiling_height\"] = spans[index + 1].text\n\n            if \"Класс\" == span.text:\n                page_data[\"class\"] = spans[index + 1].text\n\n            if \"Застройщик\" in span.text and \"Проектная декларация\" in span.text:\n                page_data[\"builder\"] = span.text.split(\".\")[0]\n\n            if \"Парковка\" == span.text:\n                page_data[\"parking_type\"] = spans[index + 1].text\n\n            if \"Этажность\" == span.text:\n                ints = re.findall(r'\\d+', spans[index + 1].text)\n                if len(ints) == 2:\n                    page_data[\"floors_from\"] = int(ints[0])\n                    page_data[\"floors_to\"] = int(ints[1])\n                if len(ints) == 1:\n                    page_data[\"floors_from\"] = int(ints[0])\n                    page_data[\"floors_to\"] = int(ints[0])\n\n        return page_data\n\n\n"
  },
  {
    "path": "cianparser/proxy_pool.py",
    "content": "import time\nimport urllib.request\nimport urllib.error\nimport bs4\nimport random\nimport socket\n\n\nclass ProxyPool:\n    def __init__(self, proxies):\n        self.__proxy_pool__ = [] if proxies is None else proxies\n        self.__current_proxy__ = None\n        self.__page_html__ = None\n\n    def __is_captcha__(self):\n        page_soup = bs4.BeautifulSoup(self.__page_html__, 'html.parser')\n        return page_soup.text.find(\"Captcha\") > 0\n\n    def __is_available_proxy__(self, url, proxy):\n        opener = urllib.request.build_opener(urllib.request.ProxyHandler({'https': proxy}))\n        opener.addheaders = [('User-agent', 'Mozilla/5.0')]\n        urllib.request.install_opener(opener)\n\n        try:\n            self.__page_html__ = urllib.request.urlopen(urllib.request.Request(url))\n        except Exception as detail:\n            print(f\"atas: {detail}..\")\n            return False\n\n        return True\n\n    def is_empty(self):\n        return len(self.__proxy_pool__) == 0\n\n    def get_available_proxy(self, url):\n        print(\"The process of checking the proxies... Search an available one among them...\")\n\n        socket.setdefaulttimeout(5)\n        found_proxy = False\n        while len(self.__proxy_pool__) > 0 and found_proxy is False:\n            proxy = random.choice(self.__proxy_pool__)\n\n            is_available = self.__is_available_proxy__(url, proxy)\n            is_captcha = self.__is_captcha__() if is_available else None\n\n            if not is_available or is_captcha:\n                if is_captcha:\n                    print(f\"proxy {proxy}: there is captcha.. trying another\")\n                else:\n                    print(f\"proxy {proxy}: unavailable.. trying another..\")\n                self.__proxy_pool__.remove(proxy)\n                time.sleep(4)\n                continue\n\n            print(f\"proxy {proxy}: available.. stop searching\")\n            self.__current_proxy__, found_proxy = proxy, True\n\n        if self.__current_proxy__ is None:\n            print(f\"there are not available proxies..\", end=\"\\n\\n\")\n\n        return self.__current_proxy__\n"
  },
  {
    "path": "cianparser/suburban/list.py",
    "content": "import bs4\nimport time\nimport pathlib\nfrom datetime import datetime\nfrom transliterate import translit\n\nfrom cianparser.constants import FILE_NAME_SUBURBAN_FORMAT\nfrom cianparser.helpers import union_dicts, define_author, parse_location_data, define_price_data, define_deal_url_id\nfrom cianparser.suburban.page import SuburbanPageParser\nfrom cianparser.base_list import BaseListPageParser\n\n\nclass SuburbanListPageParser(BaseListPageParser):\n    def build_file_path(self):\n        now_time = datetime.now().strftime(\"%d_%b_%Y_%H_%M_%S_%f\")\n        file_name = FILE_NAME_SUBURBAN_FORMAT.format(self.accommodation_type, self.object_type, self.deal_type, self.start_page, self.end_page, translit(self.location_name.lower(), reversed=True), now_time)\n        return pathlib.Path(pathlib.Path.cwd(), file_name.replace(\"'\", \"\"))\n\n    def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):\n        list_soup = bs4.BeautifulSoup(html, 'html.parser')\n\n        if list_soup.text.find(\"Captcha\") > 0:\n            print(f\"\\r{page_number} page: there is CAPTCHA... failed to parse page...\")\n            return False, attempt_number + 1, True\n\n        header = list_soup.select(\"div[data-name='HeaderDefault']\")\n        if len(header) == 0:\n            return False, attempt_number + 1, False\n\n        offers = list_soup.select(\"article[data-name='CardComponent']\")\n        print(\"\")\n        print(f\"\\r {page_number} page: {len(offers)} offers\", end=\"\\r\", flush=True)\n\n        if page_number == self.start_page and attempt_number == 0:\n            print(f\"Collecting information from pages with list of offers\", end=\"\\n\")\n\n        for ind, offer in enumerate(offers):\n            self.parse_offer(offer=offer)\n            self.print_parse_progress(page_number=page_number, count_of_pages=count_of_pages, offers=offers, ind=ind)\n\n        time.sleep(2)\n\n        return True, 0, False\n\n    def parse_offer(self, offer):\n        common_data = dict()\n        common_data[\"url\"] = offer.select(\"div[data-name='LinkArea']\")[0].select(\"a\")[0].get('href')\n        common_data[\"location\"] = self.location_name\n        common_data[\"deal_type\"] = self.deal_type\n        common_data[\"accommodation_type\"] = self.accommodation_type\n        common_data[\"suburban_type\"] = self.object_type\n\n        author_data = define_author(block=offer)\n        location_data = parse_location_data(block=offer)\n        price_data = define_price_data(block=offer)\n\n        if define_deal_url_id(common_data[\"url\"]) in self.result_set:\n            return\n\n        page_data = dict()\n        if self.with_extra_data:\n            suburban_parser = SuburbanPageParser(session=self.session, url=common_data[\"url\"])\n            page_data = suburban_parser.parse_page()\n            time.sleep(4)\n\n        self.count_parsed_offers += 1\n        self.define_average_price(price_data=price_data)\n        self.result_set.add(define_deal_url_id(common_data[\"url\"]))\n        self.result.append(union_dicts(author_data, common_data, price_data, page_data, location_data))\n\n        if self.with_saving_csv:\n            self.save_results()\n\n\n"
  },
  {
    "path": "cianparser/suburban/page.py",
    "content": "import time\n\nimport bs4\n\n\nclass SuburbanPageParser:\n    def __init__(self, session, url):\n        self.session = session\n        self.url = url\n\n    def __load_page__(self):\n        res = self.session.get(self.url)\n        if res.status_code == 429:\n            time.sleep(10)\n        res.raise_for_status()\n        self.offer_page_html = res.text\n        self.offer_page_soup = bs4.BeautifulSoup(self.offer_page_html, 'html.parser')\n\n    def parse_page(self):\n        self.__load_page__()\n\n        page_data = {\n            \"year_of_construction\": -1,\n            \"house_material_type\": -1,\n            \"land_plot\":-1,\n            \"land_plot_status\": -1,\n            \"heating_type\": -1,\n            \"gas_type\":-1,\n            \"water_supply_type\":-1,\n            \"sewage_system\":-1,\n            \"bathroom\":-1,\n            \"living_meters\": -1,\n            \"floors_count\": -1,\n            \"phone\": \"\",\n        }\n\n        spans = self.offer_page_soup.select(\"span\")\n        for index, span in enumerate(spans):\n            if \"Материал дома\" == span.text:\n                page_data[\"house_material_type\"] = spans[index + 1].text\n\n            if \"Участок\" == span.text:\n                page_data[\"land_plot\"] = spans[index + 1].text\n\n            if \"Статус участка\" == span.text:\n                page_data[\"land_plot_status\"] = spans[index + 1].text\n\n            if \"Отопление\" == span.text:\n                page_data[\"heating_type\"] = spans[index + 1].text\n\n            if \"Газ\" == span.text:\n                page_data[\"gas_type\"] = spans[index + 1].text\n\n            if \"Водоснабжение\" == span.text:\n                page_data[\"water_supply_type\"] = spans[index + 1].text\n\n            if \"Канализация\" == span.text:\n                page_data[\"sewage_system\"] = spans[index + 1].text\n\n            if \"Санузел\" == span.text:\n                page_data[\"bathroom\"] = spans[index + 1].text\n\n            if \"Площадь кухни\" == span.text:\n                page_data[\"kitchen_meters\"] = spans[index + 1].text\n\n            if \"Общая площадь\" == span.text:\n                page_data[\"living_meters\"] = spans[index + 1].text\n\n            if \"Год постройки\" in span.text:\n                page_data[\"year_of_construction\"] = spans[index + 1].text\n\n            if \"Год сдачи\" in span.text:\n                page_data[\"year_of_construction\"] = spans[index + 1].text\n\n            if \"Этажей в доме\" == span.text:\n                page_data[\"floors_count\"] = spans[index + 1].text\n\n        if \"+7\" in self.offer_page_html:\n            page_data[\"phone\"] = self.offer_page_html[self.offer_page_html.find(\"+7\"): self.offer_page_html.find(\"+7\") + 16].split('\"')[0]. \\\n                replace(\" \", \"\"). \\\n                replace(\"-\", \"\")\n\n        return page_data\n"
  },
  {
    "path": "cianparser/url_builder.py",
    "content": "from cianparser.constants import *\n\n\nclass URLBuilder:\n    def __init__(self, is_newobject):\n        self.url = BASE_URL\n        self.add_newobject_postfix() if is_newobject else self.add_default_postfix()\n        self.url += DEFAULT_PATH\n\n    def add_default_postfix(self):\n        self.url += DEFAULT_POSTFIX_PATH\n\n    def add_newobject_postfix(self):\n        self.url += NEWOBJECT_POSTFIX_PATH\n\n    def get_url(self):\n        return self.url\n\n    def add_accommodation_type(self, accommodation_type):\n        self.url += OFFER_TYPE_PATH.format(accommodation_type)\n        \n    def add_deal_type(self, deal_type):\n        self.url += DEAL_TYPE_PATH.format(deal_type)\n\n    def add_location(self, location_id):\n        self.url += REGION_PATH.format(location_id)\n\n    def add_room(self, rooms):\n        rooms_path = \"\"\n        if type(rooms) is tuple:\n            for count_of_room in rooms:\n                if type(count_of_room) is int:\n                    if 0 < count_of_room < 6:\n                        rooms_path += ROOM_PATH.format(count_of_room)\n                elif type(count_of_room) is str:\n                    if count_of_room == \"studio\":\n                        rooms_path += STUDIO_PATH\n        elif type(rooms) is int:\n            if 0 < rooms < 6:\n                rooms_path += ROOM_PATH.format(rooms)\n        elif type(rooms) is str:\n            if rooms == \"studio\":\n                rooms_path += STUDIO_PATH\n            elif rooms == \"all\":\n                rooms_path = \"\"\n\n        self.url += rooms_path\n\n    def add_rent_period_type(self, rent_period_type):\n        self.url += RENT_PERIOD_TYPE_PATH.format(rent_period_type)\n\n    def add_object_suburban_type(self, object_type):\n        self.url += OBJECT_TYPE_PATH.format(OBJECT_SUBURBAN_TYPES[object_type])\n\n    def add_additional_settings(self, additional_settings):\n        if \"object_type\" in additional_settings.keys():\n            self.url += OBJECT_TYPE_PATH.format(OBJECT_TYPES[additional_settings[\"object_type\"]])\n\n        if \"is_by_homeowner\" in additional_settings.keys() and additional_settings[\"is_by_homeowner\"]:\n            self.url += IS_ONLY_HOMEOWNER_PATH\n        if \"min_balconies\" in additional_settings.keys():\n            self.url += MIN_BALCONIES_PATH.format(additional_settings[\"min_balconies\"])\n        if \"have_loggia\" in additional_settings.keys() and additional_settings[\"have_loggia\"]:\n            self.url += HAVE_LOGGIA_PATH\n\n        if \"min_house_year\" in additional_settings.keys():\n            self.url += MIN_HOUSE_YEAR_PATH.format(additional_settings[\"min_house_year\"])\n        if \"max_house_year\" in additional_settings.keys():\n            self.url += MAX_HOUSE_YEAR_PATH.format(additional_settings[\"max_house_year\"])\n\n        if \"min_price\" in additional_settings.keys():\n            self.url += MIN_PRICE_PATH.format(additional_settings[\"min_price\"])\n        if \"max_price\" in additional_settings.keys():\n            self.url += MAX_PRICE_PATH.format(additional_settings[\"max_price\"])\n\n        if \"min_floor\" in additional_settings.keys():\n            self.url += MIN_FLOOR_PATH.format(additional_settings[\"min_floor\"])\n        if \"max_floor\" in additional_settings.keys():\n            self.url += MAX_FLOOR_PATH.format(additional_settings[\"max_floor\"])\n\n        if \"min_total_floor\" in additional_settings.keys():\n            self.url += MIN_TOTAL_FLOOR_PATH.format(additional_settings[\"min_total_floor\"])\n        if \"max_total_floor\" in additional_settings.keys():\n            self.url += MAX_TOTAL_FLOOR_PATH.format(additional_settings[\"max_total_floor\"])\n\n        if \"house_material_type\" in additional_settings.keys():\n            self.url += HOUSE_MATERIAL_TYPE_PATH.format(additional_settings[\"house_material_type\"])\n\n        if \"metro\" in additional_settings.keys():\n            if \"metro_station\" in additional_settings.keys():\n                if additional_settings[\"metro\"] in METRO_STATIONS.keys():\n                    for metro_station, metro_id in METRO_STATIONS[additional_settings[\"metro\"]]:\n                        if additional_settings[\"metro_station\"] == metro_station:\n                            self.url += METRO_ID_PATH.format(metro_id)\n\n        if \"metro_foot_minute\" in additional_settings.keys():\n            self.url += METRO_FOOT_MINUTE_PATH.format(additional_settings[\"metro_foot_minute\"])\n\n        if \"flat_share\" in additional_settings.keys():\n            self.url += FLAT_SHARE_PATH.format(additional_settings[\"flat_share\"])\n\n        if \"only_flat\" in additional_settings.keys():\n            if additional_settings[\"only_flat\"]:\n                self.url += ONLY_FLAT_PATH.format(1)\n\n        if \"only_apartment\" in additional_settings.keys():\n            if additional_settings[\"only_apartment\"]:\n                self.url += APARTMENT_PATH.format(1)\n\n        if \"sort_by\" in additional_settings.keys():\n            if additional_settings[\"sort_by\"] == IS_SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH:\n                self.url += SORT_BY_PRICE_FROM_MIN_TO_MAX_PATH\n            if additional_settings[\"sort_by\"] == IS_SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH:\n                self.url += SORT_BY_PRICE_FROM_MAX_TO_MIN_PATH\n            if additional_settings[\"sort_by\"] == IS_SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH:\n                self.url += SORT_BY_TOTAL_METERS_FROM_MAX_TO_MIN_PATH\n            if additional_settings[\"sort_by\"] == IS_SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH:\n                self.url += SORT_BY_CREATION_DATA_FROM_NEWER_TO_OLDER_PATH\n            if additional_settings[\"sort_by\"] == IS_SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH:\n                self.url += SORT_BY_CREATION_DATA_FROM_OLDER_TO_NEWER_PATH\n"
  },
  {
    "path": "setup.cfg",
    "content": "[metadata]\nname = cianparser\nversion = 1.0.4\ndescription = Parser information from Cian website\nurl = https://github.com/lenarsaitov/cianparser\nauthor = Lenar Saitov\nauthor_email = lenarsaitov1@yandex.ru\nlong_description = file: README.md\nlicense_file = MIT\nkeywords = python parser requests cloudscraper beautifulsoup cian realstate"
  },
  {
    "path": "setup.py",
    "content": "from setuptools import setup\n\nwith open(\"README.md\", encoding=\"utf8\") as file:\n    read_me_description = file.read()\n\n\nsetup(\n    name='cianparser',\n    version='1.0.4',\n    description='Parser information from Cian website',\n    url='https://github.com/lenarsaitov/cianparser',\n    author='Lenar Saitov',\n    author_email='lenarsaitov1@yandex.ru',\n    license='MIT',\n    packages=['cianparser', 'cianparser.flat', 'cianparser.newobject', 'cianparser.suburban'],\n    long_description=read_me_description,\n    long_description_content_type=\"text/markdown\",\n    classifiers=[\n        \"Programming Language :: Python :: 3\",\n        \"License :: OSI Approved :: MIT License\",\n        \"Operating System :: OS Independent\",\n    ],\n    keywords='python parser requests cloudscraper beautifulsoup cian realstate',\n    install_requires=['cloudscraper', 'beautifulsoup4', 'transliterate', 'lxml', 'datetime'],\n)\n"
  }
]