Repository: bregman-arie/devops-exercises Branch: master Commit: 9e49ef97f163 Files: 352 Total size: 1.5 MB Directory structure: gitextract_sezxo2ro/ ├── .github/ │ └── workflows/ │ └── ci_workflow.yml ├── .gitignore ├── .travis.yml ├── CONTRIBUTING-pt-BR.md ├── CONTRIBUTING.md ├── LICENSE ├── README-pt-BR.md ├── README-zh_CN.md ├── README.md ├── certificates/ │ ├── aws-certification-paths.md │ ├── aws-cloud-practitioner.md │ ├── aws-cloud-sysops-associate.md │ ├── aws-solutions-architect-associate.md │ ├── azure-fundamentals-az-900.md │ ├── cka.md │ └── ckad.md ├── coding/ │ └── python/ │ ├── binary_search.py │ └── merge_sort.py ├── credits-pt-BR.md ├── credits.md ├── exercises/ │ ├── docker/ │ │ └── docker-debugging.md │ └── shell/ │ └── solutions/ │ ├── directories_comparision.md │ └── directories_comparison.md ├── faq-pt-BR.md ├── faq.md ├── prepare_for_interview-pt-BR.md ├── prepare_for_interview.md ├── scripts/ │ ├── aws s3 event triggering/ │ │ ├── README.md │ │ ├── aws_s3_event_trigger.sh │ │ └── s3-lambda/ │ │ ├── requirements.txt │ │ └── s3-lambda.py │ ├── count_questions.sh │ ├── question_utils.py │ ├── random_question.py │ ├── run_ci.sh │ └── update_question_number.py ├── tests/ │ ├── scripts_question_utils_unittest.py │ ├── syntax_checker_unittest.py │ ├── syntax_lint.py │ └── testcases/ │ ├── testcase1.md │ ├── testcase2.md │ └── testcase3.md └── topics/ ├── ansible/ │ ├── README.md │ ├── my_first_playbook.md │ ├── my_first_task.md │ ├── solutions/ │ │ ├── my_first_playbook.md │ │ ├── my_first_task.md │ │ └── update_upgrade_task.md │ └── update_upgrade_task.md ├── argo/ │ ├── README.md │ └── exercises/ │ ├── app_creation/ │ │ ├── exercise.md │ │ └── solution.md │ ├── argocd_helm_app/ │ │ ├── exercise.md │ │ └── solution.md │ ├── blue_green_rollout/ │ │ ├── exercise.md │ │ └── solution.md │ ├── canary_rollout/ │ │ ├── exercise.md │ │ └── solution.md │ ├── secrets_101/ │ │ ├── exercise.md │ │ └── solution.md │ ├── sync_app_cluster/ │ │ ├── exercise.md │ │ └── solution.md │ └── sync_app_git/ │ ├── exercise.md │ └── solution.md ├── aws/ │ ├── README.md │ └── exercises/ │ ├── access_advisor/ │ │ ├── exercise.md │ │ └── solution.md │ ├── alb_multiple_target_groups/ │ │ ├── exercise.md │ │ └── solution.md │ ├── app_load_balancer/ │ │ ├── exercise.md │ │ └── solution.md │ ├── asg_dynamic_scaling_policy/ │ │ ├── exercise.md │ │ └── solution.md │ ├── aurora_db/ │ │ ├── exercise.md │ │ └── solution.md │ ├── auto_scaling_groups_basics/ │ │ ├── exercise.md │ │ └── solution.md │ ├── basic_s3_ci/ │ │ ├── exercise.md │ │ └── solution.md │ ├── budget_setup/ │ │ ├── exercise.md │ │ └── solution.md │ ├── create_ami/ │ │ ├── exercise.md │ │ └── solution.md │ ├── create_efs/ │ │ ├── exercise.md │ │ └── solution.md │ ├── create_role/ │ │ ├── exercise.md │ │ └── solution.md │ ├── create_spot_instances/ │ │ ├── exercise.md │ │ └── solution.md │ ├── create_user/ │ │ ├── exercise.md │ │ └── solution.md │ ├── creating_records/ │ │ ├── exercise.md │ │ └── solution.md │ ├── credential_report/ │ │ ├── exercise.md │ │ └── solution.md │ ├── ebs_volume_creation/ │ │ ├── exercise.md │ │ └── solution.md │ ├── ec2_iam_roles/ │ │ ├── exercise.md │ │ └── solution.md │ ├── ecs_task/ │ │ ├── exercise.md │ │ └── solution.md │ ├── elastic_beanstalk_simple/ │ │ ├── exercise.md │ │ └── solution.md │ ├── elastic_ip/ │ │ ├── exercise.md │ │ └── solution.md │ ├── elastic_network_interfaces/ │ │ ├── exercise.md │ │ └── solution.md │ ├── elasticache/ │ │ ├── exercise.md │ │ └── solution.md │ ├── health_checks/ │ │ ├── exercise.md │ │ └── solution.md │ ├── hello_function/ │ │ ├── exercise.md │ │ └── solution.md │ ├── hibernate_instance/ │ │ ├── exercise.md │ │ └── solution.md │ ├── launch_ec2_web_instance/ │ │ ├── exercise.md │ │ └── solution.md │ ├── mysql_db/ │ │ ├── exercise.md │ │ └── solution.md │ ├── network_load_balancer/ │ │ ├── exercise.md │ │ └── solution.md │ ├── new_vpc/ │ │ ├── exercise.md │ │ ├── main.tf │ │ ├── pulumi/ │ │ │ └── __main__.py │ │ ├── solution.md │ │ └── terraform/ │ │ └── main.tf │ ├── no_application/ │ │ ├── exercise.md │ │ └── solution.md │ ├── password_policy_and_mfa/ │ │ ├── exercise.md │ │ └── solution.md │ ├── placement_groups/ │ │ ├── exercise.md │ │ └── solution.md │ ├── register_domain/ │ │ ├── exercise.md │ │ └── solution.md │ ├── route_53_failover/ │ │ ├── exercise.md │ │ └── solution.md │ ├── s3/ │ │ └── new_bucket/ │ │ ├── exercise.md │ │ ├── pulumi/ │ │ │ └── __main__.py │ │ ├── solution.md │ │ └── terraform/ │ │ └── main.tf │ ├── sample_cdk/ │ │ ├── exercise.md │ │ └── solution.md │ ├── security_groups/ │ │ ├── exercise.md │ │ └── solution.md │ ├── snapshots/ │ │ ├── exercise.md │ │ └── solution.md │ ├── subnets/ │ │ ├── exercise.md │ │ ├── pulumi/ │ │ │ └── __main__.py │ │ ├── solution.md │ │ └── terraform/ │ │ └── main.tf │ ├── url_function/ │ │ ├── exercise.md │ │ └── solution.md │ └── web_app_lambda_dynamodb/ │ ├── exercise.md │ └── terraform/ │ └── main.tf ├── azure/ │ └── README.md ├── chaos_engineering/ │ └── README.md ├── cicd/ │ ├── README.md │ ├── ci_for_open_source_project.md │ ├── deploy_to_kubernetes.md │ ├── remove_builds.md │ ├── remove_jobs.md │ └── solutions/ │ ├── deploy_to_kubernetes/ │ │ ├── Jenkinsfile │ │ ├── README.md │ │ ├── deploy.yml │ │ ├── helloworld.yml │ │ ├── html/ │ │ │ ├── css/ │ │ │ │ ├── normalize.css │ │ │ │ └── skeleton.css │ │ │ └── index.html │ │ └── inventory │ ├── remove_builds_solution.groovy │ └── remove_jobs_solution.groovy ├── circleci/ │ └── README.md ├── cloud/ │ └── README.md ├── cloud_slack_bot.md ├── containers/ │ ├── README.md │ ├── commit_image.md │ ├── containerized_db.md │ ├── containerized_db_persistent_storage.md │ ├── containerized_web_server.md │ ├── image_layers.md │ ├── multi_stage_builds.md │ ├── run_forest_run.md │ ├── running_containers.md │ ├── sharing_images.md │ ├── solutions/ │ │ ├── commit_image.md │ │ ├── containerized_db.md │ │ ├── containerized_db_persistent_storage.md │ │ ├── containerized_web_server.md │ │ ├── image_layers.md │ │ ├── multi_stage_builds.md │ │ ├── run_forest_run.md │ │ ├── running_containers.md │ │ ├── sharing_images.md │ │ └── working_with_images.md │ ├── working_with_images.md │ └── write_containerfile_run_container.md ├── databases/ │ ├── README.md │ ├── solutions/ │ │ └── table_for_message_board_system.md │ └── table_for_message_board_system.md ├── datadog/ │ └── README.md ├── devops/ │ ├── README.md │ ├── containerize_app.md │ ├── ha_hello_world.md │ └── solutions/ │ ├── containerize_app.md │ └── ha_hello_world.md ├── dns/ │ └── README.md ├── eflk.md ├── flask_container_ci/ │ ├── README.md │ ├── app/ │ │ ├── __init__.py │ │ ├── config.py │ │ ├── main.py │ │ └── tests.py │ ├── requirements.txt │ ├── tests.py │ └── users.json ├── flask_container_ci2/ │ ├── README.md │ ├── app/ │ │ ├── __init__.py │ │ ├── config.py │ │ ├── main.py │ │ └── tests.py │ ├── requirements.txt │ └── tests.py ├── gcp/ │ ├── README.md │ └── exercises/ │ ├── assign_roles/ │ │ ├── exercise.md │ │ ├── main.tf │ │ ├── solution.md │ │ ├── vars.tf │ │ └── versions.tf │ ├── create_project/ │ │ ├── exercise.md │ │ ├── main.tf │ │ ├── solution.md │ │ └── versions.tf │ └── instance_101/ │ ├── exercise.md │ ├── main.tf │ ├── solution.md │ └── versions.tf ├── git/ │ ├── README.md │ ├── branch_01.md │ ├── commit_01.md │ ├── solutions/ │ │ ├── branch_01_solution.md │ │ ├── commit_01_solution.md │ │ └── squashing_commits.md │ └── squashing_commits.md ├── grafana/ │ └── README.md ├── jenkins_pipelines.md ├── jenkins_scripts.md ├── kafka/ │ └── README.md ├── kubernetes/ │ ├── CKA.md │ ├── README.md │ ├── exercises/ │ │ ├── kustomize_common_labels/ │ │ │ ├── exercise.md │ │ │ ├── solution.md │ │ │ └── someApp/ │ │ │ ├── deployment.yml │ │ │ └── service.yml │ │ ├── labels_and_selectors/ │ │ │ ├── exercise.md │ │ │ └── solution.md │ │ ├── node_selectors/ │ │ │ ├── exercise.md │ │ │ └── solution.md │ │ └── taints_101/ │ │ ├── exercise.md │ │ └── solution.md │ ├── killing_containers.md │ ├── pods_01.md │ ├── replicaset_01.md │ ├── replicaset_02.md │ ├── replicaset_03.md │ ├── services_01.md │ └── solutions/ │ ├── killing_containers.md │ ├── pods_01_solution.md │ ├── replicaset_01_solution.md │ ├── replicaset_02_solution.md │ ├── replicaset_03_solution.md │ └── services_01_solution.md ├── linux/ │ ├── README.md │ └── exercises/ │ ├── copy/ │ │ ├── README.md │ │ └── solution.md │ ├── create_remove/ │ │ ├── README.md │ │ └── solution.md │ ├── navigation/ │ │ ├── README.md │ │ └── solution.md │ └── uniqe_count/ │ ├── README.md │ ├── ip_list │ └── solution.md ├── misc/ │ └── elk_kibana_aws.md ├── node/ │ ├── node_questions_basic.md │ └── solutions/ │ └── node_questions_basic_ans.md ├── observability/ │ └── README.md ├── openshift/ │ ├── README.md │ ├── projects_101.md │ └── solutions/ │ ├── my_first_app.md │ └── projects_101.md ├── os/ │ ├── fork_101.md │ ├── fork_102.md │ └── solutions/ │ ├── fork_101_solution.md │ └── fork_102_solution.md ├── perl/ │ └── README.md ├── pipeline_deploy_image_to_k8.md ├── programming/ │ ├── grep_berfore_and_after.md │ └── web_scraper.md ├── python/ │ ├── advanced_data_types.md │ ├── class_0x00.md │ ├── compress_string.md │ ├── data_types.md │ ├── reverse_string.md │ ├── solutions/ │ │ ├── advanced_data_types_solution.md │ │ ├── class_0x00_solution.md │ │ ├── compress_string_solution.md │ │ ├── data_types_solution.md │ │ ├── reverse_string.md │ │ └── sort_solution.md │ └── sort.md ├── security/ │ └── README.md ├── shell/ │ ├── README.md │ ├── argument_check.md │ ├── basic_date.md │ ├── count_chars.md │ ├── directories_comparison.md │ ├── empty_files.md │ ├── factors.md │ ├── files_size.md │ ├── great_day.md │ ├── hello_world.md │ ├── host_status.md │ ├── num_of_args.md │ ├── print_arguments.md │ ├── solutions/ │ │ ├── argument_check.md │ │ ├── basic_date.md │ │ ├── count_chars.md │ │ ├── directories_comparison.md │ │ ├── empty_files.md │ │ ├── factors.md │ │ ├── files_size.md │ │ ├── great_day.md │ │ ├── hello_world.md │ │ ├── host_status.md │ │ ├── num_of_args.md │ │ └── sum.md │ └── sum.md ├── soft_skills/ │ └── README.md ├── software_development/ │ └── README.md ├── sql/ │ ├── improve_query.md │ └── solutions/ │ └── improve_query.md ├── sre/ │ └── README.md ├── terraform/ │ ├── README.md │ └── exercises/ │ ├── launch_ec2_instance/ │ │ ├── exercise.md │ │ └── solution.md │ ├── launch_ec2_web_instance/ │ │ └── exercise.md │ ├── s3_bucket_rename/ │ │ ├── exercise.md │ │ └── solution.md │ ├── terraform_local_provider/ │ │ ├── exercise.md │ │ └── solution.md │ └── vpc_subnet_creation/ │ ├── exercise.md │ └── solution.md └── zuul/ └── README.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/ci_workflow.yml ================================================ name: CI on: pull_request: branches: [ master ] jobs: ci: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install flake8 run: pip install flake8 - name: Give executable permissions to run_ci.sh inside the scripts directory run: chmod a+x scripts/run_ci.sh - name: Run the ci script inside the scripts folder run: bash scripts/run_ci.sh shell: bash ================================================ FILE: .gitignore ================================================ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # pyenv .python-version # celery beat schedule file celerybeat-schedule # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ *.pyc #Jetbrain's ides. .idea ================================================ FILE: .travis.yml ================================================ language: "python" python: - "3.8" install: - pip install flake8 script: - flake8 --max-line-length=100 . - python tests/syntax_lint.py ================================================ FILE: CONTRIBUTING-pt-BR.md ================================================ ## Como contribuir Use pull requests para contribuir com o projeto. Siga o seguinte formato: \
[Pergunta]
[Resposta] \
* Se você adicionou várias perguntas e gostaria de saber quantas perguntas existem, você pode usar o script "count_questions.sh" no diretório de scripts. ## O que evitar * Evite adicionar perguntas de instalação. Esses são o pior tipo de perguntas... * Não copie perguntas e respostas de outras fontes. Eles provavelmente trabalharam duro para adicioná-las. * Se você adicionar novas imagens, certifique-se de que elas são gratuitas e podem ser usadas. ## Antes de enviar o pull request Você pode testar suas alterações localmente com o script `run_ci.sh` no diretório de scripts. ================================================ FILE: CONTRIBUTING.md ================================================ ## How to contribute Use pull requests to contribute to the project. Stick to the following format: \
[Question]
[Answer] \
* If you added several questions and you would like to know how many questions are there you can use the script "count_questions.sh" in scripts directory. ## What to avoid * Avoid adding installation questions. Those are the worst type of questions... * Don't copy questions and answers from other sources. They probably worked hard for adding them. * If you add new images, make sure they are free and can be used. ## Before submitting the pull request You can test your changes locally with the script `run_ci.sh` in scripts directory. ================================================ FILE: LICENSE ================================================ THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED. BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND CONDITIONS. 1. Definitions "Adaptation" means a work based upon the Work, or upon the Work and other pre-existing works, such as a translation, adaptation, derivative work, arrangement of music or other alterations of a literary or artistic work, or phonogram or performance and includes cinematographic adaptations or any other form in which the Work may be recast, transformed, or adapted including in any form recognizably derived from the original, except that a work that constitutes a Collection will not be considered an Adaptation for the purpose of this License. For the avoidance of doubt, where the Work is a musical work, performance or phonogram, the synchronization of the Work in timed-relation with a moving image ("syncing") will be considered an Adaptation for the purpose of this License. "Collection" means a collection of literary or artistic works, such as encyclopedias and anthologies, or performances, phonograms or broadcasts, or other works or subject matter other than works listed in Section 1(f) below, which, by reason of the selection and arrangement of their contents, constitute intellectual creations, in which the Work is included in its entirety in unmodified form along with one or more other contributions, each constituting separate and independent works in themselves, which together are assembled into a collective whole. A work that constitutes a Collection will not be considered an Adaptation (as defined above) for the purposes of this License. "Distribute" means to make available to the public the original and copies of the Work through sale or other transfer of ownership. "Licensor" means the individual, individuals, entity or entities that offer(s) the Work under the terms of this License. "Original Author" means, in the case of a literary or artistic work, the individual, individuals, entity or entities who created the Work or if no individual or entity can be identified, the publisher; and in addition (i) in the case of a performance the actors, singers, musicians, dancers, and other persons who act, sing, deliver, declaim, play in, interpret or otherwise perform literary or artistic works or expressions of folklore; (ii) in the case of a phonogram the producer being the person or legal entity who first fixes the sounds of a performance or other sounds; and, (iii) in the case of broadcasts, the organization that transmits the broadcast. "Work" means the literary and/or artistic work offered under the terms of this License including without limitation any production in the literary, scientific and artistic domain, whatever may be the mode or form of its expression including digital form, such as a book, pamphlet and other writing; a lecture, address, sermon or other work of the same nature; a dramatic or dramatico-musical work; a choreographic work or entertainment in dumb show; a musical composition with or without words; a cinematographic work to which are assimilated works expressed by a process analogous to cinematography; a work of drawing, painting, architecture, sculpture, engraving or lithography; a photographic work to which are assimilated works expressed by a process analogous to photography; a work of applied art; an illustration, map, plan, sketch or three-dimensional work relative to geography, topography, architecture or science; a performance; a broadcast; a phonogram; a compilation of data to the extent it is protected as a copyrightable work; or a work performed by a variety or circus performer to the extent it is not otherwise considered a literary or artistic work. "You" means an individual or entity exercising rights under this License who has not previously violated the terms of this License with respect to the Work, or who has received express permission from the Licensor to exercise rights under this License despite a previous violation. "Publicly Perform" means to perform public recitations of the Work and to communicate to the public those public recitations, by any means or process, including by wire or wireless means or public digital performances; to make available to the public Works in such a way that members of the public may access these Works from a place and at a place individually chosen by them; to perform the Work to the public by any means or process and the communication to the public of the performances of the Work, including by public digital performance; to broadcast and rebroadcast the Work by any means including signs, sounds or images. "Reproduce" means to make copies of the Work by any means including without limitation by sound or visual recordings and the right of fixation and reproducing fixations of the Work, including storage of a protected performance or phonogram in digital form or other electronic medium. 2. Fair Dealing Rights. Nothing in this License is intended to reduce, limit, or restrict any uses free from copyright or rights arising from limitations or exceptions that are provided for in connection with the copyright protection under copyright law or other applicable laws. 3. License Grant. Subject to the terms and conditions of this License, Licensor hereby grants You a worldwide, royalty-free, non-exclusive, perpetual (for the duration of the applicable copyright) license to exercise the rights in the Work as stated below: to Reproduce the Work, to incorporate the Work into one or more Collections, and to Reproduce the Work as incorporated in the Collections; and, to Distribute and Publicly Perform the Work including as incorporated in Collections. The above rights may be exercised in all media and formats whether now known or hereafter devised. The above rights include the right to make such modifications as are technically necessary to exercise the rights in other media and formats, but otherwise you have no rights to make Adaptations. Subject to 8(f), all rights not expressly granted by Licensor are hereby reserved, including but not limited to the rights set forth in Section 4(d). 4. Restrictions. The license granted in Section 3 above is expressly made subject to and limited by the following restrictions: You may Distribute or Publicly Perform the Work only under the terms of this License. You must include a copy of, or the Uniform Resource Identifier (URI) for, this License with every copy of the Work You Distribute or Publicly Perform. You may not offer or impose any terms on the Work that restrict the terms of this License or the ability of the recipient of the Work to exercise the rights granted to that recipient under the terms of the License. You may not sublicense the Work. You must keep intact all notices that refer to this License and to the disclaimer of warranties with every copy of the Work You Distribute or Publicly Perform. When You Distribute or Publicly Perform the Work, You may not impose any effective technological measures on the Work that restrict the ability of a recipient of the Work from You to exercise the rights granted to that recipient under the terms of the License. This Section 4(a) applies to the Work as incorporated in a Collection, but this does not require the Collection apart from the Work itself to be made subject to the terms of this License. If You create a Collection, upon notice from any Licensor You must, to the extent practicable, remove from the Collection any credit as required by Section 4(c), as requested. You may not exercise any of the rights granted to You in Section 3 above in any manner that is primarily intended for or directed toward commercial advantage or private monetary compensation. The exchange of the Work for other copyrighted works by means of digital file-sharing or otherwise shall not be considered to be intended for or directed toward commercial advantage or private monetary compensation, provided there is no payment of any monetary compensation in connection with the exchange of copyrighted works. If You Distribute, or Publicly Perform the Work or Collections, You must, unless a request has been made pursuant to Section 4(a), keep intact all copyright notices for the Work and provide, reasonable to the medium or means You are utilizing: (i) the name of the Original Author (or pseudonym, if applicable) if supplied, and/or if the Original Author and/or Licensor designate another party or parties (e.g., a sponsor institute, publishing entity, journal) for attribution ("Attribution Parties") in Licensor's copyright notice, terms of service or by other reasonable means, the name of such party or parties; (ii) the title of the Work if supplied; (iii) to the extent reasonably practicable, the URI, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work. The credit required by this Section 4(c) may be implemented in any reasonable manner; provided, however, that in the case of a Collection, at a minimum such credit will appear, if a credit for all contributing authors of Collection appears, then as part of these credits and in a manner at least as prominent as the credits for the other contributing authors. For the avoidance of doubt, You may only use the credit required by this Section for the purpose of attribution in the manner set out above and, by exercising Your rights under this License, You may not implicitly or explicitly assert or imply any connection with, sponsorship or endorsement by the Original Author, Licensor and/or Attribution Parties, as appropriate, of You or Your use of the Work, without the separate, express prior written permission of the Original Author, Licensor and/or Attribution Parties. For the avoidance of doubt: Non-waivable Compulsory License Schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme cannot be waived, the Licensor reserves the exclusive right to collect such royalties for any exercise by You of the rights granted under this License; Waivable Compulsory License Schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme can be waived, the Licensor reserves the exclusive right to collect such royalties for any exercise by You of the rights granted under this License if Your exercise of such rights is for a purpose or use which is otherwise than noncommercial as permitted under Section 4(b) and otherwise waives the right to collect royalties through any statutory or compulsory licensing scheme; and, Voluntary License Schemes. The Licensor reserves the right to collect royalties, whether individually or, in the event that the Licensor is a member of a collecting society that administers voluntary licensing schemes, via that society, from any exercise by You of the rights granted under this License that is for a purpose or use which is otherwise than noncommercial as permitted under Section 4(b). Except as otherwise agreed in writing by the Licensor or as may be otherwise permitted by applicable law, if You Reproduce, Distribute or Publicly Perform the Work either by itself or as part of any Collections, You must not distort, mutilate, modify or take other derogatory action in relation to the Work which would be prejudicial to the Original Author's honor or reputation. 5. Representations, Warranties and Disclaimer UNLESS OTHERWISE MUTUALLY AGREED BY THE PARTIES IN WRITING, LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU. 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 7. Termination This License and the rights granted hereunder will terminate automatically upon any breach by You of the terms of this License. Individuals or entities who have received Collections from You under this License, however, will not have their licenses terminated provided such individuals or entities remain in full compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will survive any termination of this License. Subject to the above terms and conditions, the license granted here is perpetual (for the duration of the applicable copyright in the Work). Notwithstanding the above, Licensor reserves the right to release the Work under different license terms or to stop distributing the Work at any time; provided, however that any such election will not serve to withdraw this License (or any other license that has been, or is required to be, granted under the terms of this License), and this License will continue in full force and effect unless terminated as stated above. 8. Miscellaneous Each time You Distribute or Publicly Perform the Work or a Collection, the Licensor offers to the recipient a license to the Work on the same terms and conditions as the license granted to You under this License. If any provision of this License is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this License, and without further action by the parties to this agreement, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable. No term or provision of this License shall be deemed waived and no breach consented to unless such waiver or consent shall be in writing and signed by the party to be charged with such waiver or consent. This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You. The rights granted under, and the subject matter referenced, in this License were drafted utilizing the terminology of the Berne Convention for the Protection of Literary and Artistic Works (as amended on September 28, 1979), the Rome Convention of 1961, the WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996 and the Universal Copyright Convention (as revised on July 24, 1971). These rights and subject matter take effect in the relevant jurisdiction in which the License terms are sought to be enforced according to the corresponding provisions of the implementation of those treaty provisions in the applicable national law. If the standard suite of rights granted under applicable copyright law includes additional rights not granted under this License, such additional rights are deemed to be included in the License; this License is not intended to restrict the license of any rights under applicable law. ================================================ FILE: README-pt-BR.md ================================================

:information_source:  Este repositório contém perguntas e exercícios sobre vários tópicos técnicos, às vezes relacionados a DevOps e SRE :bar_chart:  Atualmente, existem **2624** exercícios e perguntas :warning:  Você pode usá-los para se preparar para uma entrevista, mas a maioria das perguntas e exercícios não representam uma entrevista real. Por favor, leia a [página de FAQ](faq.md) para mais detalhes :stop_sign:  Se você está interessado em seguir uma carreira como engenheiro de DevOps, aprender alguns dos conceitos mencionados aqui seria útil, mas você deve saber que não se trata de aprender todos os tópicos e tecnologias mencionados neste repositório :pencil:  Você pode adicionar mais exercícios enviando pull requests :) Leia sobre as diretrizes de contribuição [aqui](CONTRIBUTING.md) ****
DevOps
DevOps
Git
Git
Network
Rede
Hardware
Hardware
kubernetes
Kubernetes
programming
Desenvolvimento de Software
Python
Python
go
Go
perl
Perl
RegEx
Regex
Cloud
Nuvem
aws
AWS
azure
Azure
Google Cloud Platform
Google Cloud Platform
openstack
OpenStack
Operating System
Sistema Operacional
Linux
Linux
Virtualization
Virtualização
DNS
DNS
Bash
Shell Scripting
Databases
Bancos de Dados
sql
SQL
Mongo
Mongo
Testing
Testes
Big Data
Big Data
cicd
CI/CD
Certificates
Certificados
Containers
Contêineres
OpenShift
OpenShift
Storage
Armazenamento
Terraform
Terraform
puppet
Puppet
Distributed
Distribuído
you
Perguntas que você pode fazer
ansible
Ansible
observability
Observabilidade
Prometheus
Prometheus
Circle CI
Circle CI
DataDog
Grafana
Grafana
Argo
Argo
HR
Soft Skills
security
Segurança
Design
System Design
Chaos Engineering
Engenharia do Caos
Misc
Diversos
Elastic
Elastic
Kafka
Kafka
NodeJs
NodeJs
## Aplicações DevOps
KubePrep
KubePrep
Linux Master
Linux Master
Sytem Design Hero
System Design Hero
## Rede
Em geral, o que você precisa para se comunicar?
- Uma linguagem comum (para as duas pontas entenderem) - Uma maneira de endereçar com quem você quer se comunicar - Uma Conexão (para que o conteúdo da comunicação possa chegar aos destinatários)
O que é TCP/IP?
Um conjunto de protocolos que definem como dois ou mais dispositivos podem se comunicar. Para saber mais sobre TCP/IP, leia [aqui](http://www.penguintutor.com/linux/basic-network-reference)
O que é Ethernet?
Ethernet simplesmente se refere ao tipo mais comum de Rede de Área Local (LAN) usada hoje. Uma LAN — em contraste com uma WAN (Rede de Área Ampla), que abrange uma área geográfica maior — é uma rede conectada de computadores em uma pequena área, como seu escritório, campus universitário ou até mesmo em casa.
O que é um endereço MAC? Para que é usado?
Um endereço MAC é um número ou código de identificação único usado para identificar dispositivos individuais na rede. Pacotes que são enviados na ethernet estão sempre vindo de um endereço MAC e sendo enviados para um endereço MAC. Se um adaptador de rede está recebendo um pacote, ele compara o endereço MAC de destino do pacote com o próprio endereço MAC do adaptador.
Quando este endereço MAC é usado?: ff:ff:ff:ff:ff:ff
Quando um dispositivo envia um pacote para o endereço MAC de broadcast (FF:FF:FF:FF:FF:FF), ele é entregue a todas as estações na rede local. Broadcasts Ethernet são usados para resolver endereços IP para endereços MAC (por ARP) na camada de enlace de dados.
O que é um endereço IP?
Um endereço de Protocolo de Internet (endereço IP) é um rótulo numérico atribuído a cada dispositivo conectado a uma rede de computadores que usa o Protocolo de Internet para comunicação. Um endereço IP serve a duas funções principais: identificação de host ou interface de rede e endereçamento de localização.
Explique a máscara de sub-rede e dê um exemplo
Uma máscara de sub-rede é um número de 32 bits que mascara um endereço IP e divide os endereços IP em endereços de rede e endereços de host. A Máscara de Sub-rede é feita definindo os bits de rede como todos "1"s e os bits de host como todos "0"s. Dentro de uma determinada rede, do total de endereços de host utilizáveis, dois são sempre reservados para fins específicos e não podem ser alocados a nenhum host. Estes são o primeiro endereço, que é reservado como um endereço de rede (também conhecido como ID de rede), e o último endereço usado para broadcast de rede. [Exemplo](https://github.com/philemonnwanne/projects/tree/main/exercises/exe-09)
O que é um endereço IP privado? Em quais cenários/designs de sistema, deve-se usá-lo?
Endereços IP privados são atribuídos aos hosts na mesma rede para se comunicarem uns com os outros. Como o nome "privado" sugere, os dispositivos com endereços IP privados atribuídos não podem ser alcançados por dispositivos de nenhuma rede externa. Por exemplo, se eu estou morando em um albergue e quero que meus colegas de albergue entrem no servidor de jogo que hospedei, pedirei a eles que entrem através do endereço IP privado do meu servidor, já que a rede é local para o albergue.
O que é um endereço IP público? Em quais cenários/designs de sistema, deve-se usá-lo?
Um endereço IP público é um endereço IP voltado para o público. No caso de você estar hospedando um servidor de jogo que deseja que seus amigos entrem, você dará a seus amigos seu endereço IP público para permitir que seus computadores identifiquem e localizem sua rede e servidor para que a conexão ocorra. Uma vez que você não precisaria usar um endereço IP voltado para o público no caso de estar jogando com amigos que estavam conectados à mesma rede que você, nesse caso, você usaria um endereço IP privado. Para que alguém possa se conectar ao seu servidor que está localizado internamente, você terá que configurar um encaminhamento de porta para dizer ao seu roteador para permitir o tráfego do domínio público para sua rede e vice-versa.
Explique o modelo OSI. Quais camadas existem? Pelo que cada camada é responsável?
- Aplicação: ponta do usuário (HTTP está aqui) - Apresentação: estabelece o contexto entre as entidades da camada de aplicação (A criptografia está aqui) - Sessão: estabelece, gerencia e encerra as conexões - Transporte: transfere sequências de dados de comprimento variável de uma fonte para um host de destino (TCP e UDP estão aqui) - Rede: transfere datagramas de uma rede para outra (IP está aqui) - Enlace de dados: fornece um link entre dois nós diretamente conectados (MAC está aqui) - Física: a especificação elétrica e física da conexão de dados (Bits estão aqui) Você pode ler mais sobre o modelo OSI em [penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference)
Para cada um dos seguintes, determine a qual camada OSI pertence: * Correção de erros * Roteamento de pacotes * Cabos e sinais elétricos * Endereço MAC * Endereço IP * Encerrar conexões * Handshake de 3 vias
* Correção de erros - Enlace de dados * Roteamento de pacotes - Rede * Cabos e sinais elétricos - Física * Endereço MAC - Enlace de dados * Endereço IP - Rede * Encerrar conexões - Sessão * Handshake de 3 vias - Transporte
Quais esquemas de entrega você conhece?
Unicast: Comunicação um-para-um onde há um remetente e um receptor. Broadcast: Envio de uma mensagem para todos na rede. O endereço ff:ff:ff:ff:ff:ff é usado para broadcasting. Dois protocolos comuns que usam broadcast são ARP e DHCP. Multicast: Envio de uma mensagem para um grupo de assinantes. Pode ser um-para-muitos ou muitos-para-muitos.
O que é CSMA/CD? É usado em redes ethernet modernas?
CSMA/CD significa Carrier Sense Multiple Access / Collision Detection. Seu foco principal é gerenciar o acesso a um meio/barramento compartilhado onde apenas um host pode transmitir em um determinado momento. Algoritmo CSMA/CD: 1. Antes de enviar um quadro, ele verifica se outro host já está transmitindo um quadro. 2. Se ninguém estiver transmitindo, ele começa a transmitir o quadro. 3. Se dois hosts transmitem ao mesmo tempo, temos uma colisão. 4. Ambos os hosts param de enviar o quadro e enviam a todos um 'sinal de congestionamento' notificando a todos que ocorreu uma colisão 5. Eles esperam um tempo aleatório antes de enviá-lo novamente 6. Uma vez que cada host esperou por um tempo aleatório, eles tentam enviar o quadro novamente e assim o ciclo começa novamente
Descreva os seguintes dispositivos de rede e a diferença entre eles: * roteador * switch * hub
Um roteador, switch e hub são todos dispositivos de rede usados para conectar dispositivos em uma rede de área local (LAN). No entanto, cada dispositivo opera de maneira diferente e tem seus casos de uso específicos. Aqui está uma breve descrição de cada dispositivo e as diferenças entre eles: 1. Roteador: um dispositivo de rede que conecta vários segmentos de rede. Ele opera na camada de rede (Camada 3) do modelo OSI e usa protocolos de roteamento para direcionar dados entre redes. Os roteadores usam endereços IP para identificar dispositivos e rotear pacotes de dados para o destino correto. 2. Switch: um dispositivo de rede que conecta vários dispositivos em uma LAN. Ele opera na camada de enlace de dados (Camada 2) do modelo OSI e usa endereços MAC para identificar dispositivos e direcionar pacotes de dados para o destino correto. Os switches permitem que dispositivos na mesma rede se comuniquem de forma mais eficiente e podem evitar colisões de dados que podem ocorrer quando vários dispositivos enviam dados simultaneamente. 3. Hub: um dispositivo de rede que conecta vários dispositivos através de um único cabo e é usado para conectar vários dispositivos sem segmentar uma rede. No entanto, ao contrário de um switch, ele opera na camada física (Camada 1) do modelo OSI e simplesmente transmite pacotes de dados para todos os dispositivos conectados a ele, independentemente de o dispositivo ser o destinatário pretendido ou não. Isso significa que podem ocorrer colisões de dados e a eficiência da rede pode sofrer como resultado. Hubs geralmente não são usados em configurações de rede modernas, pois os switches são mais eficientes e fornecem melhor desempenho de rede.
O que é um "Domínio de Colisão"?
Um domínio de colisão é um segmento de rede no qual os dispositivos podem potencialmente interferir uns com os outros tentando transmitir dados ao mesmo tempo. Quando dois dispositivos transmitem dados ao mesmo tempo, pode ocorrer uma colisão, resultando em dados perdidos ou corrompidos. Em um domínio de colisão, todos os dispositivos compartilham a mesma largura de banda, e qualquer dispositivo pode potencialmente interferir na transmissão de dados de outros devices.
O que é um "Domínio de Broadcast"?
Um domínio de broadcast é um segmento de rede no qual todos os dispositivos podem se comunicar uns com os outros enviando mensagens de broadcast. Uma mensagem de broadcast é uma mensagem enviada a todos os dispositivos em uma rede, em vez de a um dispositivo específico. Em um domínio de broadcast, todos os dispositivos podem receber e processar mensagens de broadcast, independentemente de a mensagem ter sido destinada a eles ou não.
três computadores conectados a um switch. Quantos domínios de colisão existem? Quantos domínios de broadcast?
Três domínios de colisão e um domínio de broadcast
Como um roteador funciona?
Um roteador é um aparelho físico ou virtual que passa informações entre duas ou more redes de computadores comutadas por pacotes. Um roteador inspeciona o endereço de Protocolo de Internet (endereço IP) de destino de um determinado pacote de dados, calcula a melhor maneira de ele chegar ao seu destino e o encaminha de acordo.
O que é NAT?
Network Address Translation (NAT) é um processo no qual um ou mais endereços IP locais são traduzidos em um ou mais endereços IP globais e vice-versa para fornecer acesso à Internet aos hosts locais.
O que é um proxy? Como funciona? Para que precisamos dele?
Um servidor proxy atua como um gateway entre você e a internet. É um servidor intermediário que separa os usuários finais dos sites que eles navegam. Se você estiver usando um servidor proxy, o tráfego da internet flui através do servidor proxy a caminho do endereço que você solicitou. A solicitação então volta através do mesmo servidor proxy (há exceções a esta regra), e então o servidor proxy encaminha os dados recebidos do site para você. Servidores proxy fornecem vários níveis de funcionalidade, segurança e privacidade, dependendo do seu caso de uso, necessidades ou política da empresa.
O que é TCP? Como funciona? O que é o handshake de 3 vias?
O handshake de 3 vias do TCP ou handshake de três vias é um processo usado em uma rede TCP/IP para fazer uma conexão entre servidor e cliente. Um handshake de três vias é usado principalmente para criar uma conexão de soquete TCP. Funciona quando: - Um nó cliente envia um pacote de dados SYN por uma rede IP para um servidor na mesma rede ou em uma rede externa. O objetivo deste pacote é perguntar/inferir se o servidor está aberto para novas conexões. - O servidor de destino deve ter portas abertas que possam aceitar e iniciar novas conexões. Quando o servidor recebe o pacote SYN do nó cliente, ele responde e retorna um recibo de confirmação – o pacote ACK ou pacote SYN/ACK. - O nó cliente recebe o SYN/ACK do servidor e responde com um pacote ACK.
O que é atraso de ida e volta ou tempo de ida e volta?
Da [wikipedia](https://en.wikipedia.org/wiki/Round-trip_delay): "o tempo que leva para um sinal ser enviado mais o tempo que leva para um reconhecimento desse sinal ser recebido" Pergunta bônus: qual é o RTT da LAN?
Como funciona um handshake SSL?
O handshake SSL é um processo que estabelece uma conexão segura entre um cliente e um servidor. 1. O cliente envia uma mensagem Client Hello para o servidor, que inclui a versão do cliente do protocolo SSL/TLS, uma lista dos algoritmos criptográficos suportados pelo cliente e um valor aleatório. 2. O servidor responde com uma mensagem Server Hello, que inclui a versão do servidor do protocolo SSL/TLS, um valor aleatório e um ID de sessão. 3. O servidor envia uma mensagem de Certificado, que contém o certificado do servidor. 4. O servidor envia uma mensagem Server Hello Done, que indica que o servidor terminou de enviar mensagens para a fase Server Hello. 5. O cliente envia uma mensagem Client Key Exchange, que contém a chave pública do cliente. 6. O cliente envia uma mensagem Change Cipher Spec, que notifica o servidor que o cliente está prestes a enviar uma mensagem criptografada com a nova especificação de cifra. 7. O cliente envia uma Mensagem de Handshake Criptografada, que contém o segredo pré-mestre criptografado com a chave pública do servidor. 8. O servidor envia uma mensagem Change Cipher Spec, que notifica o cliente que o servidor está prestes a enviar uma mensagem criptografada com a nova especificação de cifra. 9. O servidor envia uma Mensagem de Handshake Criptografada, que contém o segredo pré-mestre criptografado com a chave pública do cliente. 10. O cliente e o servidor agora podem trocar dados de aplicação.
Qual é a diferença entre TCP e UDP?
O TCP estabelece uma conexão entre o cliente e o servidor para garantir a ordem dos pacotes, por outro lado, o UDP não estabelece uma conexão entre o cliente e o servidor e não lida com a ordem dos pacotes. Isso torna o UDP mais leve que o TCP e um candidato perfeito para serviços como streaming. [Penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference) fornece uma boa explicação.
Quais protocolos TCP/IP você conhece?
Explique o "gateway padrão"
Um gateway padrão serve como um ponto de acesso ou roteador IP que um computador em rede usa para enviar informações para um computador em outra rede ou na internet.
O que é ARP? Como funciona?
ARP significa Address Resolution Protocol. Quando você tenta pingar um endereço IP em sua rede local, digamos 192.168.1.1, seu sistema precisa transformar o endereço IP 192.168.1.1 em um endereço MAC. Isso envolve o uso de ARP para resolver o endereço, daí o nome. Os sistemas mantêm uma tabela de consulta ARP onde armazenam informações sobre quais endereços IP estão associados a quais endereços MAC. Ao tentar enviar um pacote para um endereço IP, o sistema primeiro consultará esta tabela para ver se já conhece o endereço MAC. Se houver um valor em cache, o ARP não é usado.
O que é TTL? O que ajuda a prevenir?
- TTL (Time to Live) é um valor em um pacote IP (Internet Protocol) que determina quantos saltos ou roteadores um pacote pode percorrer antes de ser descartado. Cada vez que um pacote é encaminhado por um roteador, o valor TTL é diminuído em um. Quando o valor TTL chega a zero, o pacote é descartado e uma mensagem ICMP (Internet Control Message Protocol) é enviada de volta ao remetente indicando que o pacote expirou. - TTL é usado para evitar que pacotes circulem indefinidamente na rede, o que pode causar congestionamento e degradar o desempenho da rede. - Também ajuda a evitar que pacotes fiquem presos em loops de roteamento, onde os pacotes viajam continuamente entre o mesmo conjunto de roteadores sem nunca chegar ao seu destino. - Além disso, o TTL pode ser usado para ajudar a detectar e prevenir ataques de spoofing de IP, onde um invasor tenta se passar por outro dispositivo na rede usando um endereço IP falso. Ao limitar o número de saltos que um pacote pode percorrer, o TTL pode ajudar a evitar que pacotes sejam roteados para destinos que não são legítimos.
O que é DHCP? Como funciona?
Significa Dynamic Host Configuration Protocol e aloca endereços IP, máscaras de sub-rede e gateways para hosts. É assim que funciona: * Um host ao entrar em uma rede transmite uma mensagem em busca de um servidor DHCP (DHCP DISCOVER) * Uma mensagem de oferta é enviada de volta pelo servidor DHCP como um pacote contendo tempo de locação, máscara de sub-rede, endereços IP, etc (DHCP OFFER) * Dependendo de qual oferta é aceita, o cliente envia de volta uma resposta de broadcast informando a todos os servidores DHCP (DHCP REQUEST) * O servidor envia um reconhecimento (DHCP ACK) Leia mais [aqui](https://linuxjourney.com/lesson/dhcp-overview)
Você pode ter dois servidores DHCP na mesma rede? Como funciona?
É possível ter dois servidores DHCP na mesma rede, no entanto, não é recomendado, e é importante configurá-los cuidadosamente para evitar conflitos e problemas de configuração. - Quando dois servidores DHCP são configurados na mesma rede, há o risco de que ambos os servidores atribuam endereços IP e outras configurações de rede ao mesmo dispositivo, o que pode causar conflitos e problemas de conectividade. Além disso, se os servidores DHCP forem configurados com diferentes configurações ou opções de rede, os dispositivos na rede podem receber configurações conflitantes ou inconsistentes. - No entanto, em alguns casos, pode ser necessário ter dois servidores DHCP na mesma rede, como em grandes redes onde um servidor DHCP pode não ser capaz de lidar com todas as solicitações. Nesses casos, os servidores DHCP podem ser configurados para servir diferentes faixas de endereços IP ou diferentes sub-redes, para que não interfiram uns com os outros.
O que é tunelamento SSL? Como funciona?
- O tunelamento SSL (Secure Sockets Layer) é uma técnica usada para estabelecer uma conexão segura e criptografada entre dois pontos finais em uma rede insegura, como a Internet. O túnel SSL é criado encapsulando o tráfego dentro de uma conexão SSL, que fornece confidencialidade, integridade e autenticação. Veja como funciona o tunelamento SSL: 1. Um cliente inicia uma conexão SSL com um servidor, que envolve um processo de handshake para estabelecer a sessão SSL. 2. Uma vez que a sessão SSL é estabelecida, o cliente e o servidor negociam parâmetros de criptografia, como o algoritmo de criptografia e o comprimento da chave, e então trocam certificados digitais para autenticar um ao outro. 3. O cliente então envia tráfego através do túnel SSL para o servidor, que descriptografa o tráfego e o encaminha para seu destino. 4. O servidor envia o tráfego de volta através do túnel SSL para o cliente, que descriptografa o tráfego e o encaminha para a aplicação.
O que é um soquete? Onde você pode ver a lista de soquetes em seu sistema?
- Um soquete é um ponto final de software que permite a comunicação bidirecional entre processos em uma rede. Os soquetes fornecem uma interface padronizada para comunicação de rede, permitindo que as aplicações enviem e recebam dados através de uma rede. Para visualizar a lista de soquetes abertos em um sistema Linux: ***netstat -an*** - Este comando exibe uma lista de todos os soquetes abertos, juntamente com seu protocolo, endereço local, endereço estrangeiro e estado.
O que é IPv6? Por que devemos considerar usá-lo se temos IPv4?
- IPv6 (Internet Protocol version 6) é a versão mais recente do Protocolo de Internet (IP), que é usado para identificar e se comunicar com dispositivos em uma rede. Os endereços IPv6 são endereços de 128 bits e são expressos em notação hexadecimal, como 2001:0db8:85a3:0000:0000:8a2e:0370:7334. Existem várias razões pelas quais devemos considerar o uso de IPv6 em vez de IPv4: 1. Espaço de endereço: O IPv4 tem um espaço de endereço limitado, que foi esgotado em muitas partes do mundo. O IPv6 fornece um espaço de endereço muito maior, permitindo trilhões de endereços IP únicos. 2. Segurança: O IPv6 inclui suporte integrado para IPsec, que fornece criptografia e autenticação de ponta a ponta para o tráfego de rede. 3. Desempenho: O IPv6 inclui recursos que podem ajudar a melhorar o desempenho da rede, como o roteamento multicast, que permite que um único pacote seja enviado para vários destinos simultaneamente. 4. Configuração de rede simplificada: O IPv6 inclui recursos que podem simplificar a configuração da rede, como a autoconfiguração sem estado, que permite que os dispositivos configurem automaticamente seus próprios endereços IPv6 sem a necessidade de um servidor DHCP. 5. Melhor suporte à mobilidade: O IPv6 inclui recursos que podem melhorar o suporte à mobilidade, como o Mobile IPv6, que permite que os dispositivos mantenham seus endereços IPv6 à medida que se movem entre diferentes redes.
O que é VLAN?
- Uma VLAN (Virtual Local Area Network) é uma rede lógica que agrupa um conjunto de dispositivos em uma rede física, independentemente de sua localização física. As VLANs são criadas configurando switches de rede para atribuir um ID de VLAN específico aos quadros enviados por dispositivos conectados a uma porta ou grupo de portas específico no switch.
O que é MTU?
MTU significa Maximum Transmission Unit. É o tamanho da maior PDU (Unidade de Dados de Protocolo) que pode ser enviada em uma única transação.
O que acontece se você enviar um pacote maior que o MTU?
Com o protocolo IPv4, o roteador pode fragmentar a PDU e então enviar todas as PDUs fragmentadas através da transação. Com o protocolo IPv6, ele emite um erro para o computador do usuário.
Verdadeiro ou Falso? O Ping usa UDP porque não se importa com uma conexão confiável
Falso. O Ping na verdade usa ICMP (Internet Control Message Protocol), que é um protocolo de rede usado para enviar mensagens de diagnóstico e mensagens de controle relacionadas à comunicação de rede.
O que é SDN?
- SDN significa Software-Defined Networking. É uma abordagem para o gerenciamento de rede que enfatiza a centralização do controle da rede, permitindo que os administradores gerenciem o comportamento da rede através de uma abstração de software. - Em uma rede tradicional, dispositivos de rede como roteadores, switches e firewalls são configurados e gerenciados individualmente, usando software especializado ou interfaces de linha de comando. Em contraste, o SDN separa o plano de controle da rede do plano de dados, permitindo que os administradores gerenciem o comportamento da rede através de um controlador de software centralizado.
O que é ICMP? Para que é usado?
- ICMP significa Internet Control Message Protocol. É um protocolo usado para fins de diagnóstico e controle em redes IP. Faz parte do conjunto de protocolos da Internet, operando na camada de rede. As mensagens ICMP são usadas para uma variedade de propósitos, incluindo: 1. Relatório de erros: mensagens ICMP são usadas para relatar erros que ocorrem na rede, como um pacote que não pôde ser entregue ao seu destino. 2. Ping: ICMP é usado para enviar mensagens de ping, que são usadas para testar se um host ou rede está acessível e para medir o tempo de ida e volta dos pacotes. 3. Descoberta de MTU do caminho: ICMP é usado para descobrir a Unidade Máxima de Transmissão (MTU) de um caminho, que é o maior tamanho de pacote que pode ser transmitido sem fragmentação. 4. Traceroute: ICMP é usado pelo utilitário traceroute para traçar o caminho que os pacotes percorrem pela rede. 5. Descoberta de roteador: ICMP é usado para descobrir os roteadores em uma rede.
O que é NAT? Como funciona?
NAT significa Network Address Translation. É uma maneira de mapear vários endereços privados locais para um público antes de transferir as informações. Organizações que desejam que vários dispositivos utilizem um único endereço IP usam NAT, assim como a maioria dos roteadores domésticos. Por exemplo, o IP privado do seu computador pode ser 192.168.1.100, mas seu roteador mapeia o tráfego para seu IP público (por exemplo, 1.1.1.1). Qualquer dispositivo na internet veria o tráfego vindo do seu IP público (1.1.1.1) em vez do seu IP privado (192.168.1.100).
Qual número de porta é usado em cada um dos seguintes protocolos?: * SSH * SMTP * HTTP * DNS * HTTPS * FTP * SFTP
* SSH - 22 * SMTP - 25 * HTTP - 80 * DNS - 53 * HTTPS - 443 * FTP - 21 * SFTP - 22
Quais fatores afetam o desempenho da rede?
Vários fatores podem afetar o desempenho da rede, incluindo: 1. Largura de banda: A largura de banda disponível de uma conexão de rede pode impactar significativamente seu desempenho. Redes com largura de banda limitada podem sofrer com taxas de transferência de dados lentas, alta latência e baixa responsividade. 2. Latência: A latência refere-se ao atraso que ocorre quando os dados são transmitidos de um ponto em uma rede para outro. A alta latência pode resultar em desempenho de rede lento, especialmente para aplicações em tempo real como videoconferência e jogos online. 3. Congestionamento da rede: Quando muitos dispositivos estão usando uma rede ao mesmo tempo, pode ocorrer congestionamento da rede, levando a taxas de transferência de dados lentas e baixo desempenho da rede. 4. Perda de pacotes: A perda de pacotes ocorre quando pacotes de dados são descartados durante a transmissão. Isso pode resultar em velocidades de rede mais lentas e menor desempenho geral da rede. 5. Topologia da rede: O layout físico de uma rede, incluindo a localização de switches, roteadores e outros dispositivos de rede, pode impactar o desempenho da rede. 6. Protocolo de rede: Diferentes protocolos de rede têm diferentes características de desempenho, que podem impactar o desempenho da rede. Por exemplo, o TCP é um protocolo confiável que pode garantir a entrega de dados, mas também pode resultar em desempenho mais lento devido à sobrecarga necessária para verificação de erros e retransmissão. 7. Segurança da rede: Medidas de segurança como firewalls e criptografia podem impactar o desempenho da rede, especialmente se exigirem poder de processamento significativo ou introduzirem latência adicional. 8. Distância: A distância física entre os dispositivos em uma rede pode impactar o desempenho da rede, especialmente para redes sem fio onde a força do sinal e a interferência podem afetar a conectividade e as taxas de transferência de dados.
O que é APIPA?
APIPA é um conjunto de endereços IP que os dispositivos recebem quando o servidor DHCP principal não está acessível
Qual faixa de IP o APIPA usa?
APIPA usa a faixa de IP: 169.254.0.1 - 169.254.255.254.
#### Plano de Controle e Plano de Dados
A que se refere o "plano de controle"?
O plano de controle é uma parte da rede que decide como rotear e encaminhar pacotes para um local diferente.
A que se refere o "plano de dados"?
O plano de dados é uma parte da rede que realmente encaminha os dados/pacotes.
A que se refere o "plano de gerenciamento"?
Refere-se às funções de monitoramento e gerenciamento.
A qual plano (dados, controle, ...) pertence a criação de tabelas de roteamento?
Plano de Controle.
Explique o Protocolo Spanning Tree (STP).
O que é agregação de link? Por que é usado?
O que é Roteamento Assimétrico? Como lidar com isso?
Quais protocolos de sobreposição (túnel) você conhece?
O que é GRE? Como funciona?
O que é VXLAN? Como funciona?
O que é SNAT?
Explique o OSPF.
OSPF (Open Shortest Path First) é um protocolo de roteamento que pode ser implementado em vários tipos de roteadores. Em geral, o OSPF é suportado na maioria dos roteadores modernos, incluindo os de fornecedores como Cisco, Juniper e Huawei. O protocolo é projetado para funcionar com redes baseadas em IP, incluindo IPv4 e IPv6. Além disso, ele usa um design de rede hierárquico, onde os roteadores são agrupados em áreas, com cada área tendo seu próprio mapa de topologia e tabela de roteamento. Este design ajuda a reduzir a quantidade de informações de roteamento que precisam ser trocadas entre os roteadores e a melhorar a escalabilidade da rede. Os 4 Tipos de roteadores OSPF são: * Roteador Interno * Roteadores de Borda de Área * Roteadores de Borda de Sistemas Autônomos * Roteadores de Backbone Saiba mais sobre os tipos de roteadores OSPF: https://www.educba.com/ospf-router-types/
O que é latência?
Latência é o tempo que a informação leva para chegar ao seu destino a partir da fonte.
O que é largura de banda?
Largura de banda é a capacidade de um canal de comunicação para medir quantos dados este último pode manipular durante um período de tempo específico. Mais largura de banda implicaria mais manipulação de tráfego e, portanto, mais transferência de dados.
O que é throughput?
Throughput refere-se à medição da quantidade real de dados transferidos durante um certo período de tempo através de qualquer canal de transmissão.
Ao realizar uma consulta de pesquisa, o que é mais importante, latência ou throughput? E como garantir isso gerenciando a infraestrutura global?
Latência. Para ter boa latência, uma consulta de pesquisa deve ser encaminhada para o data center mais próximo.
Ao fazer upload de um vídeo, o que é mais importante, latência ou throughput? E como garantir isso?
Throughput. Para ter um bom throughput, o fluxo de upload deve ser roteado para um link subutilizado.
Quais outras considerações (exceto latência e throughput) existem ao encaminhar solicitações?
* Manter caches atualizados (o que significa que a solicitação pode não ser encaminhada para o data center mais próximo)
Explique Spine & Leaf
"Spine & Leaf" é uma topologia de rede comumente usada em ambientes de data center para conectar múltiplos switches e gerenciar o tráfego de rede de forma eficiente. Também é conhecida como arquitetura "spine-leaf" ou topologia "leaf-spine". Este design fornece alta largura de banda, baixa latência e escalabilidade, tornando-o ideal para data centers modernos que lidam com grandes volumes de dados e tráfego. Dentro de uma rede Spine & Leaf existem dois tipos principais de switches: * Switches Spine: Switches Spine são switches de alto desempenho dispostos em uma camada de espinha dorsal (spine). Esses switches atuam como o núcleo da rede e são tipicamente interconectados com cada switch leaf. Cada switch spine está conectado a todos os switches leaf no data center. * Switches Leaf: Switches Leaf estão conectados a dispositivos finais como servidores, arrays de armazenamento e outros equipamentos de rede. Cada switch leaf está conectado a cada switch spine no data center. Isso cria uma conectividade de malha completa e sem bloqueio entre os switches leaf e spine, garantindo que qualquer switch leaf possa se comunicar com qualquer outro switch leaf com a máxima taxa de transferência. A arquitetura Spine & Leaf tornou-se cada vez mais popular em data centers devido à sua capacidade de lidar com as demandas da computação em nuvem moderna, virtualização e aplicações de big data, fornecendo uma infraestrutura de rede escalável, de alto desempenho e confiável.
O que é Congestionamento de Rede? O que pode causá-lo?
O congestionamento da rede ocorre quando há muitos dados para transmitir em uma rede e ela não tem capacidade suficiente para lidar com a demanda.
Isso pode levar ao aumento da latência e perda de pacotes. As causas podem ser múltiplas, como alto uso da rede, grandes transferências de arquivos, malware, problemas de hardware ou problemas de design da rede.
Para evitar o congestionamento da rede, é importante monitorar o uso da sua rede e implementar estratégias para limitar ou gerenciar a demanda.
O que você pode me dizer sobre o formato do pacote UDP? E sobre o formato do pacote TCP? Como é diferente?
O que é o algoritmo de recuo exponencial? Onde é usado?
Usando o código de Hamming, qual seria a palavra de código para a seguinte palavra de dados 100111010001101?
00110011110100011101
Dê exemplos de protocolos encontrados na camada de aplicação
* Protocolo de Transferência de Hipertexto (HTTP) - usado para as páginas da web na internet * Protocolo de Transferência de Correio Simples (SMTP) - transmissão de e-mail * Rede de Telecomunicações - (TELNET) - emulação de terminal para permitir que um cliente acesse um servidor telnet * Protocolo de Transferência de Arquivos (FTP) - facilita a transferência de arquivos entre quaisquer duas máquinas * Sistema de Nomes de Domínio (DNS) - tradução de nomes de domínio * Protocolo de Configuração Dinâmica de Host (DHCP) - aloca endereços IP, máscaras de sub-rede e gateways para hosts * Protocolo Simples de Gerenciamento de Rede (SNMP) - coleta dados sobre dispositivos na rede
Dê exemplos de protocolos encontrados na Camada de Rede
* Protocolo de Internet (IP) - auxilia no roteamento de pacotes de uma máquina para outra * Protocolo de Mensagens de Controle da Internet (ICMP) - informa o que está acontecendo, como mensagens de erro e informações de depuração
O que é HSTS?
HTTP Strict Transport Security é uma diretiva de servidor web que informa agentes de usuário e navegadores da web como lidar com sua conexão através de um cabeçalho de resposta enviado no início e de volta para o navegador. Isso força conexões sobre criptografia HTTPS, desconsiderando qualquer chamada de script para carregar qualquer recurso nesse domínio sobre HTTP. Leia mais [aqui](https://www.globalsign.com/en/blog/what-is-hsts-and-how-do-i-use-it#:~:text=HTTP%20Strict%20Transport%20Security%20(HSTS,and%20back%20to%20the%20browser.)
#### Rede - Diversos
O que é a Internet? É o mesmo que a World Wide Web?
A internet refere-se a uma rede de redes, transferindo enormes quantidades de dados ao redor do globo.
A World Wide Web é uma aplicação rodando em milhões de servidores, sobre a internet, acessada através do que é conhecido como navegador web
O que é o ISP?
ISP (Internet Service Provider) é o provedor local de empresa de internet.
## Sistema Operacional ### Exercícios de Sistema Operacional |Nome|Tópico|Objetivo & Instruções|Solução|Comentários| |--------|--------|------|----|----| |Fork 101|Fork|[Link](topics/os/fork_101.md)|[Link](topics/os/solutions/fork_101_solution.md) |Fork 102|Fork|[Link](topics/os/fork_102.md)|[Link](topics/os/solutions/fork_102_solution.md) ### Sistema Operacional - Autoavaliação
O que é um sistema operacional?
Do livro "Operating Systems: Three Easy Pieces": "responsável por facilitar a execução de programas (permitindo até mesmo que você execute muitos aparentemente ao mesmo tempo), permitindo que programas compartilhem memória, permitindo que programas interajam com dispositivos e outras coisas divertidas como essa".
#### Sistema Operacional - Processo
Você pode explicar o que é um processo?
Um processo é um programa em execução. Um programa é uma ou mais instruções e o programa (ou processo) é executado pelo sistema operacional.
Se você tivesse que projetar uma API para processos em um sistema operacional, como seria essa API?
Ela suportaria o seguinte: * Criar - permitir a criação de novos processos * Excluir - permitir a remoção/destruição de processos * Estado - permitir verificar o estado do processo, se está em execução, parado, esperando, etc. * Parar - permitir parar um processo em execução
Como um processo é criado?
* O SO lê o código do programa e quaisquer outros dados relevantes * O código do programa é carregado na memória ou, mais especificamente, no espaço de endereço do processo. * A memória é alocada para a pilha do programa (também conhecida como pilha de tempo de execução). A pilha também é inicializada pelo SO com dados como argv, argc e parâmetros para main() * A memória é alocada para o heap do programa, que é necessário para dados alocados dinamicamente, como as estruturas de dados de listas encadeadas e tabelas de hash * Tarefas de inicialização de E/S são executadas, como em sistemas baseados em Unix/Linux, onde cada processo tem 3 descritores de arquivo (entrada, saída e erro) * O SO executa o programa, começando por main()
Verdadeiro ou Falso? O carregamento do programa na memória é feito de forma ansiosa (tudo de uma vez)
Falso. Era verdade no passado, mas os sistemas operacionais de hoje realizam o carregamento preguiçoso, o que significa que apenas as partes relevantes necessárias para o processo ser executado são carregadas primeiro.
Quais são os diferentes estados de um processo?
* Em execução - está executando instruções * Pronto - está pronto para ser executado, mas por diferentes razões está em espera * Bloqueado - está esperando alguma operação ser concluída, por exemplo, solicitação de disco de E/S
Quais são algumas razões para um processo se tornar bloqueado?
- Operações de E/S (por exemplo, leitura de um disco) - Esperando por um pacote de uma rede
O que é Comunicação Interprocessos (IPC)?
A comunicação interprocessos (IPC) refere-se aos mecanismos fornecidos por um sistema operacional que permitem que os processos gerenciem dados compartilhados.
O que é "compartilhamento de tempo"?
Mesmo ao usar um sistema com uma única CPU física, é possível permitir que vários usuários trabalhem nele e executem programas. Isso é possível com o compartilhamento de tempo, onde os recursos de computação são compartilhados de uma forma que parece ao usuário que o sistema tem várias CPUs, mas na verdade é simplesmente uma CPU compartilhada aplicando multiprogramação e multitarefa.
O que é "compartilhamento de espaço"?
De certa forma, o oposto do compartilhamento de tempo. Enquanto no compartilhamento de tempo um recurso é usado por um tempo por uma entidade e depois o mesmo recurso pode ser usado por outro recurso, no compartilhamento de espaço o espaço é compartilhado por várias entidades, mas de uma forma que não está sendo transferido entre elas.
É usado por uma entidade, até que essa entidade decida se livrar dele. Tome como exemplo o armazenamento. No armazenamento, um arquivo é seu, até que você decida excluí-lo.
Qual componente determina qual processo é executado em um determinado momento?
Escalonador da CPU
#### Sistema Operacional - Memória
O que é "memória virtual" e qual o seu propósito?
A memória virtual combina a RAM do seu computador com espaço temporário no seu disco rígido. Quando a RAM fica baixa, a memória virtual ajuda a mover dados da RAM para um espaço chamado arquivo de paginação. Mover dados para o arquivo de paginação pode liberar a RAM, para que seu computador possa concluir seu trabalho. Em geral, quanto mais RAM seu computador tiver, mais rápido os programas serão executados. https://www.minitool.com/lib/virtual-memory.html
O que é paginação por demanda?
Paginação por demanda é uma técnica de gerenciamento de memória onde as páginas são carregadas na memória física apenas quando acessadas por um processo. Ela otimiza o uso da memória carregando páginas sob demanda, reduzindo a latência de inicialização e a sobrecarga de espaço. No entanto, introduz alguma latência ao acessar páginas pela primeira vez. No geral, é uma abordagem econômica para gerenciar recursos de memória em sistemas operacionais.
O que é copy-on-write?
Copy-on-write (COW) é um conceito de gerenciamento de recursos, com o objetivo de reduzir a cópia desnecessária de informações. É um conceito, que é implementado, por exemplo, na chamada de sistema fork do POSIX, que cria um processo duplicado do processo chamador. A ideia: 1. Se os recursos são compartilhados entre 2 ou mais entidades (por exemplo, segmentos de memória compartilhada entre 2 processos), os recursos não precisam ser copiados para cada entidade, mas sim cada entidade tem uma permissão de operação de LEITURA no recurso compartilhado. (os segmentos compartilhados são marcados como somente leitura) (Pense em cada entidade tendo um ponteiro para a localização do recurso compartilhado, que pode ser desreferenciado para ler seu valor) 2. Se uma entidade realizasse uma operação de ESCRITA em um recurso compartilhado, um problema surgiria, pois o recurso também seria permanentemente alterado para TODAS as outras entidades que o compartilham. (Pense em um processo modificando algumas variáveis na pilha, ou alocando alguns dados dinamicamente no heap, essas alterações no recurso compartilhado também se aplicariam a TODOS os outros processos, este é definitivamente um comportamento indesejável) 3. Como solução, apenas se uma operação de ESCRITA estiver prestes a ser realizada em um recurso compartilhado, este recurso é COPIADO primeiro e depois as alterações são aplicadas.
O que é um kernel e o que ele faz?
O kernel faz parte do sistema operacional e é responsável por tarefas como: * Alocar memória * Escalonar processos * Controlar a CPU
Verdadeiro ou Falso? Algumas partes do código no kernel são carregadas em áreas protegidas da memória para que as aplicações não possam sobrescrevê-las.
Verdadeiro
O que é POSIX?
POSIX (Portable Operating System Interface) é um conjunto de padrões que definem a interface entre um sistema operacional do tipo Unix e programas de aplicação.
Explique o que é Semáforo e qual o seu papel nos sistemas operacionais.
Um semáforo é uma primitiva de sincronização usada em sistemas operacionais e programação concorrente para controlar o acesso a recursos compartilhados. É uma variável ou tipo de dado abstrato que atua como um contador ou um mecanismo de sinalização para gerenciar o acesso a recursos por múltiplos processos ou threads.
O que é cache? O que é buffer?
Cache: O cache é geralmente usado quando os processos estão lendo e escrevendo no disco para tornar o processo mais rápido, tornando dados semelhantes usados por diferentes programas facilmente acessíveis. Buffer: Lugar reservado na RAM, que é usado para manter dados para fins temporários.
## Virtualização
O que é Virtualização?
A virtualização usa software para criar uma camada de abstração sobre o hardware do computador, que permite que os elementos de hardware de um único computador - processadores, memória, armazenamento e mais - sejam divididos em vários computadores virtuais, comumente chamados de máquinas virtuais (VMs).
O que é um hipervisor?
Red Hat: "Um hipervisor é um software que cria e executa máquinas virtuais (VMs). Um hipervisor, às vezes chamado de monitor de máquina virtual (VMM), isola o sistema operacional e os recursos do hipervisor das máquinas virtuais e permite a criação e o gerenciamento dessas VMs." Leia mais [aqui](https://www.redhat.com/en/topics/virtualization/what-is-a-hypervisor)
Que tipos de hipervisores existem?
Hipervisores hospedados e hipervisores bare-metal.
Quais são as vantagens e desvantagens do hipervisor bare-metal em relação a um hipervisor hospedado?
Devido a ter seus próprios drivers e um acesso direto aos componentes de hardware, um hipervisor bare-metal geralmente terá melhores desempenhos, juntamente com estabilidade e escalabilidade. Por outro lado, provavelmente haverá alguma limitação em relação ao carregamento de (quaisquer) drivers, então um hipervisor hospedado geralmente se beneficiará de ter uma melhor compatibilidade de hardware.
Que tipos de virtualização existem?
Virtualização de sistema operacional Virtualização de funções de rede Virtualização de desktop
A conteinerização é um tipo de Virtualização?
Sim, é uma virtualização em nível de sistema operacional, onde o kernel é compartilhado e permite o uso de múltiplas instâncias de espaços de usuário isolados.
Como a introdução de máquinas virtuais mudou a indústria e a forma como as aplicações eram implantadas?
A introdução de máquinas virtuais permitiu que as empresas implantassem várias aplicações de negócios no mesmo hardware, enquanto cada aplicação é separada uma da outra de forma segura, onde cada uma está rodando em seu próprio sistema operacional separado.
#### Máquinas Virtuais
Precisamos de máquinas virtuais na era dos contêineres? Elas ainda são relevantes?
Sim, as máquinas virtuais ainda são relevantes mesmo na era dos contêineres. Embora os contêineres forneçam uma alternativa leve e portátil às máquinas virtuais, eles têm certas limitações. As máquinas virtuais ainda importam porque oferecem isolamento e segurança, podem executar diferentes sistemas operacionais e são boas para aplicativos legados. As limitações dos contêineres, por exemplo, são o compartilhamento do kernel do host.
## Prometheus
O que é Prometheus? Quais são algumas das principais características do Prometheus?
Prometheus é um popular kit de ferramentas de monitoramento e alerta de sistemas de código aberto, originalmente desenvolvido no SoundCloud. Ele é projetado para coletar e armazenar dados de séries temporais e para permitir a consulta e análise desses dados usando uma poderosa linguagem de consulta chamada PromQL. O Prometheus é frequentemente usado para monitorar aplicações nativas da nuvem, microsserviços e outras infraestruturas modernas. Algumas das principais características do Prometheus incluem: 1. Modelo de dados: O Prometheus usa um modelo de dados flexível que permite aos usuários organizar e rotular seus dados de séries temporais de uma maneira que faça sentido para seu caso de uso particular. Os rótulos são usados para identificar diferentes dimensões dos dados, como a fonte dos dados ou o ambiente em que foram coletados. 2. Arquitetura baseada em pull: O Prometheus usa um modelo baseado em pull para coletar dados de alvos, o que significa que o servidor Prometheus consulta ativamente seus alvos em busca de dados de métricas em intervalos regulares. Essa arquitetura é mais escalável e confiável do que um modelo baseado em push, que exigiria que cada alvo enviasse dados para o servidor. 3. Banco de dados de séries temporais: O Prometheus armazena todos os seus dados em um banco de dados de séries temporais, que permite aos usuários realizar consultas em intervalos de tempo e agregar e analisar seus dados de várias maneiras. O banco de dados é otimizado para cargas de trabalho pesadas de escrita e pode lidar com um alto volume de dados com baixa latência. 4. Alerta: O Prometheus inclui um poderoso sistema de alerta que permite aos usuários definir regras com base em seus dados de métricas e enviar alertas quando certas condições são atendidas. Os alertas podem ser enviados por e-mail, chat ou outros canais e podem ser personalizados para incluir detalhes específicos sobre o problema. 5. Visualização: O Prometheus possui uma ferramenta de gráficos e visualização integrada, chamada PromDash, que permite aos usuários criar painéis personalizados para monitorar seus sistemas e aplicações. O PromDash suporta uma variedade of tipos de gráficos e opções de visualização, e pode ser personalizado usando CSS e JavaScript. No geral, o Prometheus é uma ferramenta poderosa e flexível para monitorar e analisar sistemas e aplicações, e é amplamente utilizado na indústria para monitoramento e observabilidade nativos da nuvem.
Em que cenários pode ser melhor NÃO usar o Prometheus?
Da documentação do Prometheus: "se você precisar de 100% de precisão, como para faturamento por solicitação".
Descreva a arquitetura e os componentes do Prometheus
A arquitetura do Prometheus consiste em quatro componentes principais: 1. Servidor Prometheus: O servidor Prometheus é responsável por coletar e armazenar dados de métricas. Ele possui uma camada de armazenamento simples e integrada que permite armazenar dados de séries temporais em um banco de dados ordenado por tempo. 2. Bibliotecas de Cliente: O Prometheus fornece uma variedade de bibliotecas de cliente que permitem que as aplicações exponham seus dados de métricas em um formato que pode ser ingerido pelo servidor Prometheus. Essas bibliotecas estão disponíveis para uma variedade de linguagens de programação, incluindo Java, Python e Go. 3. Exportadores: Exportadores são componentes de software que expõem métricas existentes de sistemas de terceiros e as tornam disponíveis para ingestão pelo servidor Prometheus. O Prometheus fornece exportadores para uma variedade de tecnologias populares, incluindo MySQL, PostgreSQL e Apache. 4. Alertmanager: O componente Alertmanager é responsável por processar alertas gerados pelo servidor Prometheus. Ele pode lidar com alertas de várias fontes e fornece uma variedade de recursos para desduplicar, agrupar e rotear alertas para os canais apropriados. No geral, a arquitetura do Prometheus é projetada para ser altamente escalável e resiliente. O servidor e as bibliotecas de cliente podem ser implantados de forma distribuída para suportar o monitoramento em ambientes de grande escala e altamente dinâmicos
Você pode comparar o Prometheus com outras soluções como o InfluxDB, por exemplo?
Comparado a outras soluções de monitoramento, como o InfluxDB, o Prometheus é conhecido por seu alto desempenho e escalabilidade. Ele pode lidar com grandes volumes de dados e pode ser facilmente integrado a outras ferramentas no ecossistema de monitoramento. O InfluxDB, por outro lado, é conhecido por sua facilidade de uso e simplicidade. Possui uma interface amigável e fornece APIs fáceis de usar para coletar e consultar dados. Outra solução popular, o Nagios, é um sistema de monitoramento mais tradicional que depende de um modelo baseado em push para coletar dados. O Nagios existe há muito tempo e é conhecido por sua estabilidade e confiabilidade. No entanto, em comparação com o Prometheus, o Nagios carece de alguns dos recursos mais avançados, como modelo de dados multidimensional e linguagem de consulta poderosa. No geral, a escolha de uma solução de monitoramento depende das necessidades e requisitos específicos da organização. Embora o Prometheus seja uma ótima escolha para monitoramento e alerta em grande escala, o InfluxDB pode ser uma opção melhor para ambientes menores que exigem facilidade de uso e simplicidade. O Nagios continua sendo uma escolha sólida para organizações que priorizam estabilidade e confiabilidade em detrimento de recursos avançados.
O que é um Alerta?
No Prometheus, um alerta é uma notificação acionada quando uma condição ou limite específico é atingido. Os alertas podem ser configurados para serem acionados quando certas métricas ultrapassam um determinado limite ou quando ocorrem eventos específicos. Uma vez que um alerta é acionado, ele pode ser roteado para vários canais, como e-mail, pager ou chat, para notificar as equipes ou indivíduos relevantes para tomar as medidas apropriadas. Os alertas são um componente crítico de qualquer sistema de monitoramento, pois permitem que as equipes detectem e respondam proativamente a problemas antes que eles afetem os usuários ou causem tempo de inatividade do sistema.
O que é uma Instância? O que é um Job?
No Prometheus, uma instância refere-se a um único alvo que está sendo monitorado. Por exemplo, um único servidor ou serviço. Um job é um conjunto de instâncias que executam a mesma função, como um conjunto de servidores web servindo a mesma aplicação. Os jobs permitem que você defina e gerencie um grupo de alvos juntos. Em essência, uma instância é um alvo individual do qual o Prometheus coleta métricas, enquanto um job é uma coleção de instâncias semelhantes que podem ser gerenciadas como um grupo.
Quais tipos de métricas principais o Prometheus suporta?
O Prometheus suporta vários tipos de métricas, incluindo: 1. Contador: Um valor monotonicamente crescente usado para rastrear contagens de eventos ou amostras. Exemplos incluem o número de solicitações processadas ou o número total de erros encontrados. 2. Medidor (Gauge): Um valor que pode subir ou descer, como o uso da CPU ou o uso da memória. Ao contrário dos contadores, os valores dos medidores podem ser arbitrários, o que significa que eles podem subir e descer com base nas mudanças no sistema que está sendo monitorado. 3. Histograma: Um conjunto de observações ou eventos que são divididos em baldes (buckets) com base em seu valor. Histogramas ajudam a analisar a distribuição de uma métrica, como latências de solicitação ou tamanhos de resposta. 4. Resumo (Summary): Um resumo é semelhante a um histograma, mas em vez de baldes, ele fornece um conjunto de quantis para os valores observados. Resumos são úteis para monitorar a distribuição de latências de solicitação ou tamanhos de resposta ao longo do tempo. O Prometheus também suporta várias funções e operadores para agregar e manipular métricas, como soma, máximo, mínimo e taxa. Esses recursos o tornam uma ferramenta poderosa para monitorar e alertar sobre métricas do sistema.
O que é um exportador? Para que é usado?
O exportador serve como uma ponte entre o sistema ou aplicação de terceiros e o Prometheus, tornando possível para o Prometheus monitorar e coletar dados desse sistema ou aplicação. O exportador atua como um servidor, ouvindo em uma porta de rede específica por solicitações do Prometheus para coletar métricas. Ele coleta métricas do sistema ou aplicação de terceiros e as transforma em um formato que pode ser entendido pelo Prometheus. O exportador então expõe essas métricas ao Prometheus através de um endpoint HTTP, tornando-as disponíveis para coleta e análise. Exportadores são comumente usados para monitorar vários tipos de componentes de infraestrutura, como bancos de dados, servidores web e sistemas de armazenamento. Por exemplo, existem exportadores disponíveis para monitorar bancos de dados populares como MySQL и PostgreSQL, bem como servidores web como Apache e Nginx. No geral, os exportadores são um componente crítico do ecossistema Prometheus, permitindo o monitoramento de uma ampla gama de sistemas e aplicações, e fornecendo um alto grau de flexibilidade e extensibilidade para a plataforma.
Quais são as melhores práticas do Prometheus?
Aqui estão três delas: 1. Rotule com cuidado: A rotulagem cuidadosa e consistente das métricas é crucial para consultas e alertas eficazes. Os rótulos devem ser claros, concisos e incluir todas as informações relevantes sobre a métrica. 2. Mantenha as métricas simples: As métricas expostas pelos exportadores devem ser simples e focar em um único aspecto do sistema que está sendo monitorado. Isso ajuda a evitar confusão e garante que as métricas sejam facilmente compreensíveis por todos os membros da equipe. 3. Use alertas com moderação: Embora o alerta seja um recurso poderoso do Prometheus, ele deve ser usado com moderação e apenas para os problemas mais críticos. Configurar muitos alertas pode levar à fadiga de alertas e resultar em alertas importantes sendo ignorados. Recomenda-se configurar apenas os alertas mais importantes e ajustar os limites ao longo do tempo com base na frequência real dos alertas.
Como obter o total de solicitações em um determinado período de tempo?
Para obter o total de solicitações em um determinado período de tempo usando o Prometheus, você pode usar a função *sum* junto com a função *rate*. Aqui está um exemplo de consulta que lhe dará o número total de solicitações na última hora: ``` sum(rate(http_requests_total[1h])) ``` Nesta consulta, *http_requests_total* é o nome da métrica que rastreia o número total de solicitações HTTP, e a função *rate* calcula a taxa por segundo de solicitações na última hora. A função *sum* então soma todas as solicitações para lhe dar o número total de solicitações na última hora. Você pode ajustar o intervalo de tempo alterando a duração na função *rate*. Por exemplo, se você quisesse obter o número total de solicitações no último dia, poderia alterar a função para *rate(http_requests_total[1d])*.
O que significa HA no Prometheus?
HA significa Alta Disponibilidade (High Availability). Isso significa que o sistema é projetado para ser altamente confiável e sempre disponível, mesmo diante de falhas ou outros problemas. Na prática, isso geralmente envolve a configuração de várias instâncias do Prometheus e a garantia de que todas estejam sincronizadas e capazes de trabalhar juntas sem problemas. Isso pode ser alcançado por meio de uma variedade de técnicas, como balanceamento de carga, replicação e mecanismos de failover. Ao implementar HA no Prometheus, os usuários podem garantir que seus dados de monitoramento estejam sempre disponíveis e atualizados, mesmo diante de falhas de hardware ou software, problemas de rede ou outros problemas que poderiam causar tempo de inatividade ou perda de dados.
Como você junta duas métricas?
No Prometheus, a junção de duas métricas pode ser alcançada usando a função *join()*. A função *join()* combina duas ou mais séries temporais com base em seus valores de rótulo. Ela recebe dois argumentos obrigatórios: *on* e *table*. O argumento *on* especifica os rótulos para juntar e o argumento *table* especifica a série temporal a ser juntada. Aqui está um exemplo de como juntar duas métricas usando a função *join()*: ``` sum_series( join( on(service, instance) request_count_total, on(service, instance) error_count_total, ) ) ``` Neste exemplo, a função *join()* combina as séries temporais *request_count_total* e *error_count_total* com base em seus valores de rótulo *service* e *instance*. A função *sum_series()* então calcula a soma da série temporal resultante
Como escrever uma consulta que retorna o valor de um rótulo?
Para escrever uma consulta que retorna o valor de um rótulo no Prometheus, você pode usar a função *label_values*. A função *label_values* recebe dois argumentos: o nome do rótulo e o nome da métrica. Por exemplo, se você tem uma métrica chamada *http_requests_total* com um rótulo chamado *method*, e você quer retornar todos os valores do rótulo *method*, você pode usar a seguinte consulta: ``` label_values(http_requests_total, method) ``` Isso retornará uma lista de todos os valores para o rótulo *method* na métrica *http_requests_total*. Você pode então usar esta lista em consultas futuras ou para filtrar seus dados.
Como você converte cpu_user_seconds em uso de CPU em porcentagem?
Para converter *cpu_user_seconds* em uso de CPU em porcentagem, você precisa dividi-lo pelo tempo total decorrido e pelo número de núcleos de CPU, e então multiplicar por 100. A fórmula é a seguinte: ``` 100 * sum(rate(process_cpu_user_seconds_total{job=""}[])) by (instance) / ( * ) ``` Aqui, ** é o nome do job que você quer consultar, ** é o intervalo de tempo que você quer consultar (por exemplo, *5m*, *1h*), e ** é o número de núcleos de CPU na máquina que você está consultando. Por exemplo, para obter o uso de CPU em porcentagem nos últimos 5 minutos para um job chamado *my-job* rodando em uma máquina com 4 núcleos de CPU, você pode usar a seguinte consulta: ``` 100 * sum(rate(process_cpu_user_seconds_total{job="my-job"}[5m])) by (instance) / (5m * 4) ```
## Go
Quais são algumas características da linguagem de programação Go?
* Tipagem forte e estática - o tipo das variáveis não pode ser alterado ao longo do tempo e elas devem ser definidas em tempo de compilação * Simplicidade * Tempos de compilação rápidos * Concorrência integrada * Coleta de lixo * Independente de plataforma * Compila para binário autônomo - tudo o que você precisa para executar seu aplicativo será compilado em um único binário. Muito útil para gerenciamento de versão em tempo de execução. Go também tem uma boa comunidade.
Qual é a diferença entre var x int = 2 e x := 2?
O resultado é o mesmo, uma variável com o valor 2. Com var x int = 2 estamos definindo o tipo da variável como inteiro, enquanto com x := 2 estamos deixando o Go descobrir o tipo por si só.
Verdadeiro ou Falso? Em Go, podemos redeclarar variáveis e, uma vez declaradas, devemos usá-las. Falso. Não podemos redeclarar variáveis, mas sim, devemos usar as variáveis declaradas.
Quais bibliotecas de Go você já usou?
Isso deve ser respondido com base no seu uso, mas alguns exemplos são: * fmt - E/S formatada
Qual é o problema com o seguinte bloco de código? Como consertá-lo? ``` func main() { var x float32 = 13.5 var y int y = x } ```
O seguinte bloco de código tenta converter o inteiro 101 para uma string, mas em vez disso obtemos "e". Por que isso? Como consertar? ```go package main import "fmt" func main() { var x int = 101 var y string y = string(x) fmt.Println(y) } ```
Ele verifica qual valor unicode está definido em 101 e o usa para converter o inteiro em uma string. Se você quiser obter "101", deve usar o pacote "strconv" e substituir y = string(x) por y = strconv.Itoa(x)
O que está errado com o seguinte código?: ``` package main func main() { var x = 2 var y = 3 const someConst = x + y } ```
Constantes em Go só podem ser declaradas usando expressões constantes. Mas `x`, `y` e sua soma são variáveis.
const initializer x + y is not a constant
Qual será a saída do seguinte bloco de código?: ```go package main import "fmt" const ( x = iota y = iota ) const z = iota func main() { fmt.Printf("%v\n", x) fmt.Printf("%v\n", y) fmt.Printf("%v\n", z) } ```
O identificador iota do Go é usado em declarações const para simplificar definições de números incrementais. Como pode ser usado em expressões, ele fornece uma generalidade além da de simples enumerações.
`x` e `y` no primeiro grupo iota, `z` no segundo.
Página Iota na Wiki do Go
Para que _ é usado em Go?
Ele evita ter que declarar todas as variáveis para os valores de retorno. É chamado de identificador em branco.
resposta no SO
Qual será a saída do seguinte bloco de código?: ```go package main import "fmt" const ( _ = iota + 3 x ) func main() { fmt.Printf("%v\n", x) } ```
Como o primeiro iota é declarado com o valor `3` (` + 3`), o próximo tem o valor `4`
Qual será a saída do seguinte bloco de código?: ```go package main import ( "fmt" "sync" "time" ) func main() { var wg sync.WaitGroup wg.Add(1) go func() { time.Sleep(time.Second * 2) fmt.Println("1") wg.Done() }() go func() { fmt.Println("2") }() wg.Wait() fmt.Println("3") } ```
Saída: 2 1 3 Artigo sobre sync/waitgroup Pacote sync do Golang
Qual será a saída do seguinte bloco de código?: ```go package main import ( "fmt" ) func mod1(a []int) { for i := range a { a[i] = 5 } fmt.Println("1:", a) } func mod2(a []int) { a = append(a, 125) // ! for i := range a { a[i] = 5 } fmt.Println("2:", a) } func main() { s1 := []int{1, 2, 3, 4} mod1(s1) fmt.Println("1:", s1) s2 := []int{1, 2, 3, 4} mod2(s2) fmt.Println("2:", s2) } ```
Saída:
1 [5 5 5 5]
1 [5 5 5 5]
2 [5 5 5 5 5]
2 [1 2 3 4]
Em `mod1`, `a` é um link, e quando usamos `a[i]`, estamos mudando o valor de `s1` também. Mas em `mod2`, `append` cria uma nova fatia, e estamos mudando apenas o valor de `a`, não de `s2`. Artigo sobre arrays, Postagem no blog sobre `append`
Qual será a saída do seguinte bloco de código?: ```go package main import ( "container/heap" "fmt" ) // An IntHeap is a min-heap of ints. type IntHeap []int func (h IntHeap) Len() int { return len(h) } func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] } func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] } func (h *IntHeap) Push(x interface{}) { // Push and Pop use pointer receivers because they modify the slice's length, // not just its contents. *h = append(*h, x.(int)) } func (h *IntHeap) Pop() interface{} { old := *h n := len(old) x := old[n-1] *h = old[0 : n-1] return x } func main() { h := &IntHeap{4, 8, 3, 6} heap.Init(h) heap.Push(h, 7) fmt.Println((*h)[0]) } ```
Saída: 3 Pacote container/heap do Golang
## Mongo
Quais são as vantagens do MongoDB? Ou, em outras palavras, por que escolher o MongoDB e não outra implementação de NoSQL?
As vantagens do MongoDB são as seguintes: - Sem esquema (Schemaless) - Fácil de escalar horizontalmente - Sem junções complexas - A estrutura de um único objeto é clara
Qual é a diferença entre SQL e NoSQL?
A principal diferença é que os bancos de dados SQL são estruturados (os dados são armazenados na forma de tabelas com linhas e colunas - como uma tabela de planilha do Excel) enquanto o NoSQL é não estruturado, e o armazenamento de dados pode variar dependendo de como o banco de dados NoSQL é configurado, como par chave-valor, orientado a documentos, etc.
Em que cenários você preferiria usar NoSQL/Mongo em vez de SQL?
* Dados heterogêneos que mudam com frequência * A consistência e integridade dos dados não são prioridade máxima * Melhor se o banco de dados precisar escalar rapidamente
O que é um documento? O que é uma coleção?
* Um documento é um registro no MongoDB, que é armazenado no formato BSON (Binary JSON) e é a unidade básica de dados no MongoDB. * Uma coleção é um grupo de documentos relacionados armazenados em um único banco de dados no MongoDB.
O que é um agregador?
* Um agregador é um framework no MongoDB que realiza operações em um conjunto de dados para retornar um único resultado computado.
O que é melhor? Documentos embutidos ou referenciados?
* Não há uma resposta definitiva sobre qual é melhor, depende do caso de uso e dos requisitos específicos. Algumas explicações: Documentos embutidos fornecem atualizações atômicas, enquanto documentos referenciados permitem uma melhor normalização.
Você já realizou otimizações de recuperação de dados no Mongo? Se não, você pode pensar em maneiras de otimizar uma recuperação de dados lenta?
* Algumas maneiras de otimizar a recuperação de dados no MongoDB são: indexação, design de esquema adequado, otimização de consulta e balanceamento de carga do banco de dados.
##### Consultas
Explique esta consulta: db.books.find({"name": /abc/})
Explique esta consulta: db.books.find().sort({x:1})
Qual é a diferença entre find() e find_one()?
* `find()` retorna todos os documentos que correspondem às condições da consulta. * find_one() retorna apenas um documento que corresponde às condições da consulta (ou nulo se nenhuma correspondência for encontrada).
Como você pode exportar dados do Mongo DB?
* mongoexport * linguagens de programação
## SQL ### Exercícios de SQL |Nome|Tópico|Objetivo & Instruções|Solução|Comentários| |--------|--------|------|----|----| | Funções vs. Comparações | Melhorias de Consulta | Exercício | Solução ### Autoavaliação de SQL
O que é SQL?
SQL (Structured Query Language) é uma linguagem padrão para bancos de dados relacionais (como MySQL, MariaDB, ...).
É usada para ler, atualizar, remover e criar dados em um banco de dados relacional.
Como o SQL é diferente do NoSQL
A principal diferença é que os bancos de dados SQL são estruturados (os dados são armazenados na forma de tabelas com linhas e colunas - como uma tabela de planilha do Excel) enquanto o NoSQL é não estruturado, e o armazenamento de dados pode variar dependendo de como o banco de dados NoSQL é configurado, como par chave-valor, orientado a documentos, etc.
Quando é melhor usar SQL? NoSQL?
SQL - Melhor usado quando a integridade dos dados é crucial. O SQL é normalmente implementado em muitas empresas e áreas do setor financeiro devido à sua conformidade com o ACID. NoSQL - Ótimo se você precisar escalar as coisas rapidamente. O NoSQL foi projetado com aplicações web em mente, então funciona muito bem se você precisar espalhar rapidamente a mesma informação para vários servidores Além disso, como o NoSQL não adere à estrutura rígida de tabela com colunas e linhas que os Bancos de Dados Relacionais exigem, você pode armazenar diferentes tipos de dados juntos.
##### SQL Prático - Básico Para estas perguntas, usaremos as tabelas Clientes e Pedidos mostradas abaixo: **Clientes** ID_Cliente | Nome_Cliente | Itens_no_carrinho | Dinheiro_gasto_ate_Data ------------ | ------------- | ------------- | ------------- 100204 | John Smith | 0 | 20.00 100205 | Jane Smith | 3 | 40.00 100206 | Bobby Frank | 1 | 100.20 **PEDIDOS** ID_Cliente | ID_Pedido | Item | Preço | Data_vendido ------------ | ------------- | ------------- | ------------- | ------------- 100206 | A123 | Pato de Borracha | 2.20 | 2019-09-18 100206 | A123 | Banho de Espuma | 8.00 | 2019-09-18 100206 | Q987 | Pacote com 80 Papéis Higiênicos | 90.00 | 2019-09-20 100205 | Z001 | Ração de Gato - Atum | 10.00 | 2019-08-05 100205 | Z001 | Ração de Gato - Frango | 10.00 | 2019-08-05 100205 | Z001 | Ração de Gato - Carne | 10.00 | 2019-08-05 100205 | Z001 | Ração de Gato - quesadilla de gatinho | 10.00 | 2019-08-05 100204 | X202 | Café | 20.00 | 2019-04-29
Como eu selecionaria todos os campos desta tabela?
Select *
From Clientes;
Quantos itens estão no carrinho de John?
Select Itens_no_carrinho
From Clientes
Where Nome_Cliente = "John Smith";
Qual é a soma de todo o dinheiro gasto por todos os clientes?
Select SUM(Dinheiro_gasto_ate_Data) as SOMA_DINHEIRO
From Clientes;
Quantas pessoas têm itens em seus carrinhos?
Select count(1) as Numero_de_Pessoas_c_itens
From Clientes
where Itens_no_carrinho > 0;
Como você juntaria a tabela de clientes à tabela de pedidos?
Você os juntaria pela chave única. Neste caso, a chave única é ID_Cliente em ambas as tabelas Clientes e Pedidos
Como você mostraria qual cliente pediu quais itens?
Select c.Nome_Cliente, o.Item
From Clientes c
Left Join Pedidos o
On c.ID_Cliente = o.ID_Cliente;
Usando uma instrução with, como você mostraria quem pediu ração de gato e o valor total gasto?
with racao_gato as (
Select ID_Cliente, SUM(Preco) as PRECO_TOTAL
From Pedidos
Where Item like "%Ração de Gato%"
Group by ID_Cliente
)
Select Nome_Cliente, PRECO_TOTAL
From Clientes c
Inner JOIN racao_gato f
ON c.ID_Cliente = f.ID_Cliente
where c.ID_Cliente in (Select ID_Cliente from racao_gato); Embora esta tenha sido uma instrução simples, a cláusula "with" realmente brilha quando uma consulta complexa precisa ser executada em uma tabela antes de se juntar a outra. As instruções With são boas, porque você cria um pseudo temp ao executar sua consulta, em vez de criar uma tabela totalmente nova. A Soma de todas as compras de ração de gato não estava prontamente disponível, então usamos uma instrução with para criar a pseudo tabela para recuperar a soma dos preços gastos por cada cliente, e então juntamos a tabela normalmente.
Qual das seguintes consultas você usaria? ``` SELECT count(*) SELECT count(*) FROM compras_shawarma FROM compras_shawarma WHERE vs. WHERE YEAR(comprado_em) == '2017' comprado_em >= '2017-01-01' AND comprado_em <= '2017-31-12' ```
``` SELECT count(*) FROM compras_shawarma WHERE comprado_em >= '2017-01-01' AND comprado_em <= '2017-31-12' ``` Quando você usa uma função (`YEAR(comprado_em)`), ela precisa varrer todo o banco de dados, em oposição ao uso de índices e, basicamente, da coluna como ela é, em seu estado natural.
## OpenStack
Quais componentes/projetos do OpenStack você conhece?
Estou mais familiarizado com vários componentes principais do OpenStack: - Nova para provisionamento de recursos de computação, incluindo gerenciamento do ciclo de vida de VMs. - Neutron para redes, com foco na criação e gerenciamento de redes, sub-redes e roteadores. - Cinder para armazenamento em bloco, usado para anexar e gerenciar volumes de armazenamento. - Keystone para serviços de identidade, lidando com autenticação e autorização. Já implementei estes em projetos anteriores, configurando-os para escalabilidade e segurança para suportar ambientes multi-tenant.
Você pode me dizer pelo que cada um dos seguintes serviços/projetos é responsável?: - Nova - Neutron - Cinder - Glance - Keystone
* Nova - Gerenciar instâncias virtuais * Neutron - Gerenciar redes fornecendo Rede como Serviço (NaaS) * Cinder - Armazenamento em Bloco * Glance - Gerenciar imagens para máquinas virtuais e contêineres (pesquisar, obter e registrar) * Keystone - Serviço de autenticação em toda a nuvem
Identifique o serviço/projeto usado para cada um dos seguintes: * Copiar ou fazer snapshot de instâncias * GUI para visualizar e modificar recursos * Armazenamento em Bloco * Gerenciar instâncias virtuais
* Glance - Serviço de Imagens. Também usado para copiar ou fazer snapshot de instâncias * Horizon - GUI para visualizar e modificar recursos * Cinder - Armazenamento em Bloco * Nova - Gerenciar instâncias virtuais
O que é um tenant/projeto?
Determine verdadeiro ou falso: * OpenStack é gratuito para usar * O serviço responsável pela rede é o Glance * O propósito do tenant/projeto é compartilhar recursos entre diferentes projetos e usuários do OpenStack
Descreva em detalhes como você inicia uma instância com um IP flutuante
Você recebe uma ligação de um cliente dizendo: "Eu consigo pingar minha instância, mas não consigo conectar (ssh) nela". Qual pode ser o problema?
Que tipos de redes o OpenStack suporta?
Como você depura problemas de armazenamento do OpenStack? (ferramentas, logs, ...)
Como você depura problemas de computação do OpenStack? (ferramentas, logs, ...)
#### Implantação do OpenStack & TripleO
Você já implantou o OpenStack no passado? Se sim, pode descrever como fez isso?
Você está familiarizado com o TripleO? Como ele é diferente do Devstack ou Packstack?
Você pode ler sobre o TripleO aqui aqui
#### Computação OpenStack
Você pode descrever o Nova em detalhes?
* Usado para provisionar e gerenciar instâncias virtuais * Suporta Multi-Tenancy em diferentes níveis - logging, controle do usuário final, auditoria, etc. * Altamente escalável * A autenticação pode ser feita usando sistema interno ou LDAP * Suporta múltiplos tipos de armazenamento em bloco * Tenta ser agnóstico em relação a hardware e hipervisor
O que você sabe sobre a arquitetura e os componentes do Nova?
* nova-api - o servidor que serve metadados e APIs de computação * os diferentes componentes do Nova se comunicam usando uma fila (geralmente Rabbitmq) e um banco de dados * uma solicitação para criar uma instância é inspecionada pelo nova-scheduler, que determina onde a instância será criada e executada * nova-compute é o componente responsável por se comunicar com o hipervisor para criar a instância e gerenciar seu ciclo de vida
#### Rede OpenStack (Neutron)
Explique o Neutron em detalhes
* Um dos componentes principais do OpenStack e um projeto autônomo * O Neutron foca em entregar rede como serviço * Com o Neutron, os usuários podem configurar redes na nuvem e configurar e gerenciar uma variedade de serviços de rede * O Neutron interage com: * Keystone - autorizar chamadas de API * Nova - o nova se comunica com o neutron para conectar NICs a uma rede * Horizon - suporta entidades de rede no painel e também fornece uma visão de topologia que inclui detalhes de rede
Explique cada um dos seguintes componentes: - neutron-dhcp-agent - neutron-l3-agent - neutron-metering-agent - neutron-*-agtent - neutron-server
* neutron-l3-agent - Encaminhamento L3/NAT (fornece acesso à rede externa para VMs, por exemplo) * neutron-dhcp-agent - Serviços DHCP * neutron-metering-agent - Medição de tráfego L3 * neutron-*-agtent - gerencia a configuração local do vSwitch em cada computação (com base no plugin escolhido) * neutron-server - expõe a API de rede e passa as solicitações para outros plugins, se necessário
Explique estes tipos de rede: - Rede de Gerenciamento - Rede de Convidado - Rede de API - Rede Externa
* Rede de Gerenciamento - usada para comunicação interna entre os componentes do OpenStack. Qualquer endereço IP nesta rede é acessível apenas dentro do datacenter * Rede de Convidado - usada para comunicação entre instâncias/VMs * Rede de API - usada para comunicação da API de serviços. Qualquer endereço IP nesta rede é acessível publicamente * Rede Externa - usada para comunicação pública. Qualquer endereço IP nesta rede é acessível por qualquer pessoa na internet
Em que ordem você deve remover as seguintes entidades: * Rede * Porta * Roteador * Sub-rede
- Porta - Sub-rede - Roteador - Rede Existem muitas razões para isso. Uma, por exemplo: você não pode remover o roteador se houver portas ativas atribuídas a ele.
O que é uma rede provedora?
Quais componentes e serviços existem para L2 e L3?
O que é o plug-in ML2? Explique sua arquitetura
O que é o agente L2? Como ele funciona e pelo que é responsável?
O que é o agente L3? Como ele funciona e pelo que é responsável?
Explique pelo que o agente de Metadados é responsável
Quais entidades de rede o Neutron suporta?
Como você depura problemas de rede do OpenStack? (ferramentas, logs, ...)
#### OpenStack - Glance
Explique o Glance em detalhes
* Glance é o serviço de imagem do OpenStack * Ele lida com solicitações relacionadas a discos e imagens de instâncias * O Glance também é usado para criar snapshots para backups rápidos de instâncias * Os usuários podem usar o Glance para criar novas imagens ou fazer upload de existentes
Descreva a arquitetura do Glance
* glance-api - responsável por lidar com chamadas de API de imagem, como recuperação e armazenamento. Consiste em duas APIs: 1. registry-api - responsável por solicitações internas 2. API do usuário - pode ser acessada publicamente * glance-registry - responsável por lidar com solicitações de metadados de imagem (por exemplo, tamanho, tipo, etc). Este componente é privado, o que significa que não está disponível publicamente * serviço de definição de metadados - API para metadados personalizados * banco de dados - para armazenar metadados de imagens * repositório de imagens - para armazenar imagens. Pode ser um sistema de arquivos, armazenamento de objetos swift, HTTP, etc.
#### OpenStack - Swift
Explique o Swift em detalhes
* Swift é o serviço de Object Store e é um armazenamento altamente disponível, distribuído e consistente, projetado para armazenar muitos dados * O Swift distribui dados por vários servidores enquanto os grava em vários discos * Pode-se optar por adicionar servidores adicionais para escalar o cluster. Tudo isso enquanto o swift mantém a integridade das informações e replicações de dados.
Os usuários podem armazenar por padrão um objeto de 100GB de tamanho?
Não por padrão. A API de Armazenamento de Objetos limita o máximo a 5GB por objeto, mas pode ser ajustado.
Explique o seguinte em relação ao Swift: * Contêiner * Conta * Objeto
- Contêiner - Define um namespace para objetos. - Conta - Define um namespace para contêineres - Objeto - Conteúdo de dados (por exemplo, imagem, documento, ...)
Verdadeiro ou Falso? pode haver dois objetos com o mesmo nome no mesmo contêiner, mas não em dois contêineres diferentes
Falso. Dois objetos podem ter o mesmo nome se estiverem em contêineres diferentes.
#### OpenStack - Cinder
Explique o Cinder em detalhes
* Cinder é o serviço de Armazenamento em Bloco do OpenStack * Basicamente, ele fornece aos usuários recursos de armazenamento que eles podem consumir com outros serviços, como o Nova * Uma das implementações de armazenamento mais usadas e suportadas pelo Cinder é o LVM * Do ponto de vista do usuário, isso é transparente, o que significa que o usuário não sabe onde, nos bastidores, o armazenamento está localizado ou que tipo de armazenamento é usado
Descreva os componentes do Cinder
* cinder-api - recebe solicitações de API * cinder-volume - gerencia dispositivos de bloco anexados * cinder-scheduler - responsável por armazenar volumes
#### OpenStack - Keystone
Você pode descrever os seguintes conceitos em relação ao Keystone? - Papel (Role) - Inquilino/Projeto (Tenant/Project) - Serviço (Service) - Ponto de extremidade (Endpoint) - Token
- Papel - Uma lista de direitos e privilégios que determinam o que um usuário ou um projeto pode realizar - Inquilino/Projeto - Representação lógica de um grupo de recursos isolado de outros grupos de recursos. Pode ser uma conta, organização, ... - Serviço - Um ponto de extremidade que o usuário pode usar para acessar diferentes recursos - Ponto de extremidade - um endereço de rede que pode ser usado para acessar um determinado serviço OpenStack - Token - Usado para acessar recursos enquanto descreve quais recursos podem ser acessados usando um escopo
Quais são as propriedades de um serviço? Em outras palavras, como um serviço é identificado?
Usando: - Nome - Número de ID - Tipo - Descrição
Explique o seguinte: - PublicURL - InternalURL - AdminURL
- PublicURL - Acessível publicamente através da internet pública - InternalURL - Usado para comunicação entre serviços - AdminURL - Usado para gerenciamento administrativo
O que é um catálogo de serviços?
Uma lista de serviços e seus pontos de extremidade
#### OpenStack Avançado - Serviços
Descreva cada um dos seguintes serviços * Swift * Sahara * Ironic * Trove * Aodh * Ceilometer
* Swift - armazenamento de objeto/blob altamente disponível, distribuído e eventualmente consistente * Sahara - Gerenciar Clusters Hadoop * Ironic - Provisionamento Bare Metal * Trove - Banco de dados como serviço que roda no OpenStack * Aodh - Serviço de Alarmes * Ceilometer - Rastrear e monitorar o uso
Identifique o serviço/projeto usado para cada um dos seguintes: * Banco de dados como serviço que roda no OpenStack * Provisionamento Bare Metal * Rastrear e monitorar o uso * Serviço de Alarmes * Gerenciar Clusters Hadoop * armazenamento de objeto/blob altamente disponível, distribuído e eventualmente consistente
* Banco de dados como serviço que roda no OpenStack - Trove * Provisionamento Bare Metal - Ironic * Rastrear e monitorar o uso - Ceilometer * Serviço de Alarmes - Aodh * Gerenciar Clusters Hadoop * Gerenciar Clusters Hadoop - Sahara * armazenamento de objeto/blob altamente disponível, distribuído e eventualmente consistente - Swift
#### OpenStack Avançado - Keystone
Você pode descrever o serviço Keystone em detalhes?
* Você não pode ter o OpenStack implantado sem o Keystone * Ele fornece serviços de identidade, política и token * A autenticação fornecida é para usuários e serviços * A autorização suportada é baseada em token e baseada em usuário. * Existe uma política definida com base em RBAC armazenada em um arquivo JSON e cada linha nesse arquivo define o nível de acesso a ser aplicado
Descreva a arquitetura do Keystone
* Existe uma API de serviço e uma API de administração através das quais o Keystone recebe solicitações * O Keystone tem quatro backends: * Backend de Token - Tokens temporários para usuários e serviços * Backend de Política - Gerenciamento de regras e autorização * Backend de Identidade - usuários e grupos (banco de dados autônomo, LDAP, ...) * Backend de Catálogo - Pontos de extremidade * Possui um ambiente conectável onde você pode integrar com: * LDAP * KVS (Key Value Store) * SQL * PAM * Memcached
Descreva o processo de autenticação do Keystone
* O Keystone recebe uma chamada/solicitação e verifica se é de um usuário autorizado, usando nome de usuário, senha e authURL * Uma vez confirmado, o Keystone fornece um token. * Um token contém uma lista de projetos do usuário, então não há necessidade de autenticar todas as vezes e um token pode ser enviado em vez disso
#### OpenStack Avançado - Computação (Nova)
O que cada um dos seguintes faz?: * nova-api * nova-compuate * nova-conductor * nova-cert * nova-consoleauth * nova-scheduler
* nova-api - responsável por gerenciar solicitações/chamadas * nova-compute - responsável por gerenciar o ciclo de vida da instância * nova-conductor - Media entre o nova-compute e o banco de dados para que o nova-compute não o acesse diretamente
Que tipos de proxies Nova você conhece?
* Nova-novncproxy - Acesso através de conexões VNC * Nova-spicehtml5proxy - Acesso através de SPICE * Nova-xvpvncproxy - Acesso através de uma conexão VNC
#### OpenStack Avançado - Rede (Neutron)
Explique o roteamento dinâmico BGP
Qual é o papel dos namespaces de rede no OpenStack?
#### OpenStack Avançado - Horizon
Você pode descrever o Horizon em detalhes?
* Projeto baseado em Django com foco em fornecer um painel do OpenStack e a capacidade de criar painéis personalizados adicionais * Você pode usá-lo para acessar os diferentes recursos de serviços do OpenStack - instâncias, imagens, redes, ... * Ao acessar o painel, os usuários podem usá-lo para listar, criar, remover e modificar os diferentes recursos * Também é altamente personalizável e você pode modificar ou adicionar a ele com base em suas necessidades
O que você pode dizer sobre a arquitetura do Horizon?
* A API é compatível com versões anteriores * Existem três tipos de painéis: usuário, sistema e configurações * Ele fornece suporte principal para todos os projetos principais do OpenStack, como Neutron, Nova, etc. (pronto para uso, sem necessidade de instalar pacotes ou plugins extras) * Qualquer pessoa pode estender os painéis e adicionar novos componentes * O Horizon fornece modelos e classes principais a partir dos quais se pode construir seu próprio painel
## Puppet
O que é Puppet? Como funciona?
* Puppet é uma ferramenta de gerenciamento de configuração que garante que todos os sistemas sejam configurados para um estado desejado e previsível.
Explique a arquitetura do Puppet
* O Puppet tem uma arquitetura de nó primário-secundário. Os clientes são distribuídos pela rede e se comunicam com o ambiente primário-secundário onde os módulos do Puppet estão presentes. O agente cliente envia um certificado com seu ID para o servidor; o servidor então assina esse certificado e o envia de volta para o cliente. Essa autenticação permite uma comunicação segura e verificável entre o cliente e o mestre.
Você pode comparar o Puppet com outras ferramentas de gerenciamento de configuração? Por que você escolheu usar o Puppet?
* O Puppet é frequentemente comparado a outras ferramentas de gerenciamento de configuração como Chef, Ansible, SaltStack e cfengine. A escolha de usar o Puppet geralmente depende das necessidades de uma organização, como facilidade de uso, escalabilidade e suporte da comunidade.
Explique o seguinte: * Módulo * Manifesto * Nó
* Módulos - são uma coleção de manifestos, modelos e arquivos * Manifestos - são os códigos reais para configurar os clientes * Nó - permite atribuir configurações específicas a nós específicos
Explique o Facter
* Facter é uma ferramenta autônoma no Puppet que coleta informações sobre um sistema e sua configuração, como o sistema operacional, endereços IP, memória e interfaces de rede. Essas informações podem ser usadas em manifestos do Puppet para tomar decisões sobre como os recursos devem ser gerenciados e para personalizar o comportamento do Puppet com base nas características do sistema. O Facter está integrado ao Puppet, e seus fatos podem ser usados dentro dos manifestos do Puppet para tomar decisões sobre o gerenciamento de recursos.
O que é MCollective?
* MCollective é um sistema de middleware que se integra ao Puppet para fornecer recursos de orquestração, execução remota e execução de trabalhos paralelos.
Você tem experiência em escrever módulos? Qual módulo você criou e para quê?
Explique o que é Hiera
* Hiera é um armazenamento de dados hierárquico no Puppet que é usado para separar dados do código, permitindo que os dados sejam mais facilmente separados, gerenciados e reutilizados.
## Elastic
O que é a Stack Elástica?
A Stack Elástica consiste em: * Elasticsearch * Kibana * Logstash * Beats * Elastic Hadoop * Servidor APM Elasticsearch, Logstash e Kibana também são conhecidos como a stack ELK.
Explique o que é Elasticsearch
Da documentação oficial: "Elasticsearch é um armazenamento de documentos distribuído. Em vez de armazenar informações como linhas de dados colunares, o Elasticsearch armazena estruturas de dados complexas que foram serializadas como documentos JSON"
O que é Logstash?
Do blog: "Logstash é um pipeline poderoso e flexível que coleta, enriquece e transporta dados. Ele funciona como uma ferramenta de extração, transformação e carga (ETL) para coletar mensagens de log."
Explique o que são beats
Beats são remetentes de dados leves. Esses remetentes de dados são instalados no cliente onde os dados residem. Exemplos de beats: Filebeat, Metricbeat, Auditbeat. Existem muitos mais.
O que é Kibana?
Da documentação oficial: "Kibana é uma plataforma de análise e visualização de código aberto projetada para funcionar com o Elasticsearch. Você usa o Kibana para pesquisar, visualizar e interagir com dados armazenados nos índices do Elasticsearch. Você pode facilmente realizar análises de dados avançadas e visualizar seus dados em uma variedade de gráficos, tabelas e mapas."
Descreva o que acontece desde o momento em que um aplicativo registra algumas informações até que elas sejam exibidas para o usuário em um painel quando a stack Elastic é usada
O processo pode variar com base na arquitetura escolhida e no processamento que você pode querer aplicar aos logs. Um fluxo de trabalho possível é: 1. Os dados registrados pelo aplicativo são coletados pelo filebeat e enviados para o logstash 2. O Logstash processa o log com base nos filtros definidos. Uma vez feito, a saída é enviada para o Elasticsearch 2. O Elasticsearch armazena o documento que recebeu e o documento é indexado para acesso rápido no futuro 4. O usuário cria visualizações no Kibana com base nos dados indexados 5. O usuário cria um painel composto pela visualização criada na etapa anterior
##### Elasticsearch
O que é um nó de dados?
É aqui que os dados são armazenados e também onde ocorrem diferentes processamentos (por exemplo, quando você pesquisa por um dado).
O que é um nó mestre?
Parte das responsabilidades de um nó mestre: * Rastrear o status de todos os nós no cluster * Verificar se as réplicas estão funcionando e se os dados estão disponíveis em todos os nós de dados. * Sem nós quentes (nenhum nó de dados que trabalhe muito mais do que outros nós) Embora possa haver vários nós mestres, na realidade, apenas um deles é o nó mestre eleito.
O que é um nó de ingestão?
Um nó responsável por processar os dados de acordo com o pipeline de ingestão. Caso você não precise usar logstash, então este nó pode receber dados de beats e processá-los, de forma semelhante a como pode ser processado no Logstash.
O que é um nó Apenas de Coordenação?
Da documentação oficial: Nós apenas de coordenação podem beneficiar grandes clusters, descarregando a função de nó de coordenação dos nós de dados e elegíveis a mestre. Eles se juntam ao cluster e recebem o estado completo do cluster, como qualquer outro nó, e usam o estado do cluster para rotear solicitações diretamente para o(s) local(is) apropriado(s).
Como os dados são armazenados no Elasticsearch?
* Os dados são armazenados em um índice * O índice é distribuído pelo cluster usando shards
O que é um Índice?
Índice no Elasticsearch é na maioria dos casos comparado a um banco de dados inteiro do mundo SQL/NoSQL.
Você pode optar por ter um índice para conter todos os dados do seu aplicativo ou ter vários índices onde cada índice contém um tipo diferente do seu aplicativo (por exemplo, índice para cada serviço que seu aplicativo está executando). A documentação oficial também oferece uma ótima explicação (em geral, é uma documentação muito boa, como todo projeto deveria ter): "Um índice pode ser pensado como uma coleção otimizada de documentos e cada documento é uma coleção de campos, que são os pares chave-valor que contêm seus dados"
Explique os Shards
Um índice é dividido em shards e os documentos são hash para um shard específico. Cada shard pode estar em um nó diferente em um cluster e cada um dos shards é um índice autocontido.
Isso permite que o Elasticsearch escale para um cluster inteiro de servidores.
O que é um Índice Invertido?
Da documentação oficial: "Um índice invertido lista cada palavra única que aparece em qualquer documento e identifica todos os documentos em que cada palavra ocorre."
O que é um Documento?
Continuando com a comparação com SQL/NoSQL, um Documento no Elasticsearch é uma linha em uma tabela no caso de SQL ou um documento em uma coleção no caso de NoSQL. Como no NoSQL, um documento é um objeto JSON que contém dados sobre uma unidade em seu aplicativo. O que é essa unidade depende do seu aplicativo. Se o seu aplicativo está relacionado a livros, cada documento descreve um livro. Se o seu aplicativo é sobre camisas, cada documento é uma camisa.
Você verifica a saúde do seu cluster elasticsearch e está vermelho. O que isso significa? O que pode fazer com que o status seja amarelo em vez de verde?
Vermelho significa que alguns dados não estão disponíveis em seu cluster. Alguns shards de seus índices não estão atribuídos. Existem alguns outros estados para o cluster. Amarelo significa que você tem shards não atribuídos no cluster. Você pode estar neste estado se tiver um único nó e seus índices tiverem réplicas. Verde significa que todos os shards no cluster estão atribuídos a nós e seu cluster está saudável.
Verdadeiro ou Falso? O Elasticsearch indexa todos os dados em todos os campos e cada campo indexado tem a mesma estrutura de dados para capacidade de consulta unificada и rápida
Falso. Da documentação oficial: "Cada campo indexado tem uma estrutura de dados dedicada e otimizada. Por exemplo, campos de texto são armazenados em índices invertidos, e campos numéricos e geográficos são armazenados em árvores BKD."
Quais campos reservados um documento possui?
* _index * _id * _type
Explique o Mapeamento
Quais são as vantagens de definir seu próprio mapeamento? (ou: quando você usaria seu próprio mapeamento?)
* Você pode otimizar campos para correspondência parcial * Você pode definir formatos personalizados de campos conhecidos (por exemplo, data) * Você pode realizar análises específicas do idioma
Explique as Réplicas
Em um ambiente de rede/nuvem onde falhas podem ser esperadas a qualquer momento, é muito útil e altamente recomendado ter um mecanismo de failover caso um shard/nó de alguma forma fique offline ou desapareça por qualquer motivo. Para este fim, o Elasticsearch permite que você faça uma ou mais cópias dos shards do seu índice no que são chamados de shards de réplica, ou réplicas para abreviar.
Você pode explicar a Frequência de Termo & Frequência de Documento?
Frequência de Termo é a frequência com que um termo aparece em um determinado documento e Frequência de Documento é a frequência com que um termo aparece em todos os documentos. Ambos são usados para determinar a relevância de um termo, calculando Frequência de Termo / Frequência de Documento.
Você verifica a "Fase Atual" em "Gerenciamento do ciclo de vida do índice" e vê que está definido como "quente". O que isso significa?
"O índice está sendo ativamente escrito". Mais sobre as fases aqui
O que este comando faz? curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'{ "name": "John Doe" }'
Ele cria o índice do cliente se ele não existir e adiciona um novo documento com o campo nome definido como "John Doe". Além disso, se for o primeiro documento, ele receberá o ID 1.
O que acontecerá se você executar o comando anterior duas vezes? E se executá-lo 100 vezes?
1. Se o valor do nome fosse diferente, ele atualizaria "nome" para o novo valor 2. Em qualquer caso, ele aumenta o campo de versão em um
O que é a API em Massa? Para que você a usaria?
A API em massa é usada quando você precisa indexar vários documentos. Para um grande número de documentos, seria significativamente mais rápido usar em vez de solicitações individuais, pois há menos viagens de ida e volta na rede.
##### Query DSL
Explique a sintaxe de consulta do Elasticsearch (Booleanos, Campos, Intervalos)
Explique o que é Pontuação de Relevância
Explique o Contexto de Consulta e o Contexto de Filtro
Da documentação oficial: "No contexto de consulta, uma cláusula de consulta responde à pergunta “Quão bem este documento corresponde a esta cláusula de consulta?” Além de decidir se o documento corresponde ou não, a cláusula de consulta também calcula uma pontuação de relevância no meta-campo _score." "Em um contexto de filtro, uma cláusula de consulta responde à pergunta “Este documento corresponde a esta cláusula de consulta?” A resposta é um simples Sim ou Não — nenhuma pontuação é calculada. O contexto de filtro é usado principalmente para filtrar dados estruturados"
Descreva como uma arquitetura de ambiente de produção com grandes quantidades de dados seria diferente de um ambiente de pequena escala
Existem várias respostas possíveis para esta pergunta. Uma delas é a seguinte: Uma arquitetura de pequena escala do elastic consistirá na stack elástica como ela é. Isso significa que teremos beats, logstash, elastcsearch e kibana.
Um ambiente de produção com grandes quantidades de dados pode incluir algum tipo de componente de buffer (por exemplo, Reddis ou RabbitMQ) e também um componente de segurança, como o Nginx.
##### Logstash
O que são plugins do Logstash? Que tipos de plugins existem?
* Plugins de Entrada - como coletar dados de diferentes fontes * Plugins de Filtro - processamento de dados * Plugins de Saída - enviar dados para diferentes saídas/serviços/plataformas
O que é grok?
Um plugin do logstash que modifica informações em um formato e as imerge em outro.
Como o grok funciona?
Quais padrões grok você conhece?
O que é `_grokparsefailure?`
Como você testa ou depura padrões grok?
O que são Codecs do Logstash? Que codecs existem?
##### Kibana
O que você pode encontrar em "Discover" no Kibana?
Os dados brutos como estão armazenados no índice. Você pode pesquisar e filtrá-los.
Você vê no Kibana, após clicar em Discover, "561 hits". O que isso significa?
Número total de documentos que correspondem aos resultados da pesquisa. Se nenhuma consulta for usada, então simplesmente o número total de documentos.
O que você pode encontrar em "Visualize"?
"Visualize" é onde você pode criar representações visuais para seus dados (gráficos de pizza, gráficos, ...)
Quais tipos de visualização são suportados/incluídos no Kibana?
Que tipo de visualização você usaria para outliers estatísticos
Descreva em detalhes como você cria um painel no Kibana
#### Filebeat
O que é Filebeat?
Filebeat é usado para monitorar os diretórios de log dentro de VMs ou montado como um sidecar se estiver exportando logs de contêineres, e então encaminhar esses logs para processamento posterior, geralmente para o logstash.
Se alguém está usando ELK, é obrigatório usar também o filebeat? Em que cenários é útil usar o filebeat?
Filebeat é um componente típico da stack ELK, uma vez que foi desenvolvido pela Elastic para funcionar com os outros produtos (Logstash e Kibana). É possível enviar logs diretamente para o logstash, embora isso geralmente exija alterações de codificação para a aplicação. Particularmente para aplicações legadas com pouca cobertura de teste, pode ser uma opção melhor usar o filebeat, já que você não precisa fazer nenhuma alteração no código da aplicação.
O que é um harvester?
Leia aqui
Verdadeiro ou Falso? um único harvester colhe vários arquivos, de acordo com os limites definidos em filebeat.yml
Falso. Um harvester colhe um arquivo.
O que são módulos do filebeat?
Estes são módulos pré-configurados para tipos específicos de locais de log (por exemplo, Traefik, Fargate, HAProxy) para facilitar a configuração do encaminhamento de logs usando o filebeat. Eles têm configurações diferentes com base em onde você está coletando logs.
#### Stack Elástica
Como você protege uma Stack Elástica?
Você pode gerar certificados com as utilidades elásticas fornecidas e alterar a configuração para habilitar a segurança usando o modelo de certificados.
## Distribuído
Explique Computação Distribuída (ou Sistema Distribuído)
De acordo com Martin Kleppmann: "Muitos processos rodando em muitas máquinas... apenas passagem de mensagens através de uma rede não confiável com atrasos variáveis, e o sistema pode sofrer de falhas parciais, relógios não confiáveis e pausas de processo." Outra definição: "Sistemas que estão fisicamente separados, mas logicamente conectados"
O que pode causar a falha de um sistema?
* Rede * CPU * Memória * Disco
Você sabe o que é o "teorema CAP"? (também conhecido como teorema de Brewer)
De acordo com o teorema CAP, não é possível para um armazenamento de dados distribuído fornecer mais de dois dos seguintes ao mesmo tempo: * Disponibilidade: Cada solicitação recebe uma resposta (não precisa ser os dados mais recentes) * Consistência: Cada solicitação recebe uma resposta com os dados mais recentes/atuais * Tolerância a partição: Mesmo que alguns dados sejam perdidos/descartados, o sistema continua funcionando
Quais são os problemas com o seguinte design? Como melhorá-lo?

1. A transição pode levar tempo. Em outras palavras, tempo de inatividade perceptível. 2. O servidor de standby é um desperdício de recursos - se o primeiro servidor de aplicação estiver funcionando, o de standby não faz nada
Quais são os problemas com o seguinte design? Como melhorá-lo?

Problemas: Se o balanceador de carga morrer, perdemos a capacidade de nos comunicar com a aplicação. Maneiras de melhorar: * Adicionar outro balanceador de carga * Usar registro DNS A para ambos os balanceadores de carga * Usar fila de mensagens
O que é arquitetura "Shared-Nothing"?
É uma arquitetura na qual os dados são e recuperados de uma única fonte não compartilhada, geralmente conectada exclusivamente a um nó, em oposição a arquiteturas onde a solicitação pode chegar a um de muitos nós e os dados serão recuperados de um local compartilhado (armazenamento, memória, ...).
Explique o Padrão Sidecar (Ou proxy sidecar)
## Diversos |Nome|Tópico|Objetivo & Instruções|Solução|Comentários| |--------|--------|------|----|----| | "Hello World" Altamente Disponível | Exercício | Solução
O que acontece quando você digita uma URL em uma barra de endereços em um navegador?
1. O navegador procura o registro do endereço IP do nome de domínio no DNS na seguinte ordem: * Cache do navegador * Cache do sistema operacional * O servidor DNS configurado no sistema do usuário (pode ser DNS do ISP, DNS público, ...) 2. Se não conseguiu encontrar um registro DNS localmente, uma resolução DNS completa é iniciada. 3. Ele se conecta ao servidor usando o protocolo TCP 4. O navegador envia uma solicitação HTTP para o servidor 5. O servidor envia uma resposta HTTP de volta para o navegador 6. O navegador renderiza a resposta (por exemplo, HTML) 7. O navegador então envia solicitações subsequentes conforme necessário para o servidor para obter os links embutidos, javascript, imagens no HTML e então os passos 3 a 5 são repetidos. TODO: adicionar mais detalhes!
#### API
Explique o que é uma API
Gosto desta definição de blog.christianposta.com: "Uma interface explicitamente e propositalmente definida, projetada para ser invocada sobre uma rede que permite aos desenvolvedores de software obter acesso programático a dados e funcionalidades dentro de uma organização de maneira controlada e confortável."
O que é uma especificação de API?
De swagger.io: "Uma especificação de API fornece uma ampla compreensão de como uma API se comporta e como a API se conecta com outras APIs. Ela explica como a API funciona e os resultados a serem esperados ao usar a API"
Verdadeiro ou Falso? Definição de API é o mesmo que Especificação de API
Falso. De swagger.io: "Uma definição de API é semelhante a uma especificação de API, pois fornece uma compreensão de como uma API é organizada e como a API funciona. Mas a definição de API é destinada ao consumo por máquina, em vez do consumo humano de APIs."
O que é um gateway de API?
Um gateway de API é como o porteiro que controla como diferentes partes conversam entre si e como as informações são trocadas entre elas. O gateway de API fornece um único ponto de entrada para todos os clientes e pode executar várias tarefas, incluindo roteamento de solicitações para o serviço de backend apropriado, balanceamento de carga, segurança e autenticação, limitação de taxa, cache e monitoramento. Ao usar um gateway de API, as organizações podem simplificar o gerenciamento de suas APIs, garantir segurança e governança consistentes e melhorar o desempenho e a escalabilidade de seus serviços de backend. Eles também são comumente usados em arquiteturas de microsserviços, onde existem muitos serviços pequenos e independentes que precisam ser acessados por diferentes clientes.
Quais são as vantagens de usar/implementar um gateway de API?
Vantagens: - Simplifica o gerenciamento de API: Fornece um único ponto de entrada para todas as solicitações, o que simplifica o gerenciamento e o monitoramento de várias APIs. - Melhora a segurança: Capaz de implementar recursos de segurança como autenticação, autorização e criptografia para proteger os serviços de backend contra acesso não autorizado. - Aumenta a escalabilidade: Pode lidar com picos de tráfego e distribuir solicitações para serviços de backend de uma forma que maximiza a utilização de recursos e melhora o desempenho geral do sistema. - Permite a composição de serviços: Pode combinar diferentes serviços de backend em uma única API, fornecendo controle mais granular sobre os serviços que os clientes podem acessar. - Facilita a integração com sistemas externos: Pode ser usado para expor serviços internos a parceiros ou clientes externos, facilitando a integração com sistemas externos e permitindo novos modelos de negócios.
O que é uma Carga Útil (Payload) em API?
O que é Automação? Como está relacionada ou é diferente de Orquestração?
Automação é o ato de automatizar tarefas para reduzir a intervenção ou interação humana em relação à tecnologia e sistemas de TI.
Enquanto a automação se concentra no nível da tarefa, a Orquestração é o processo de automatizar processos e/ou fluxos de trabalho que consistem em várias tarefas que geralmente abrangem vários sistemas.
Conte-me sobre bugs interessantes que você encontrou e também corrigiu
O que é um Depurador e como ele funciona?
Quais serviços um aplicativo pode ter?
* Autorização * Logging * Autenticação * Pedidos * Front-end * Back-end ...
O que são Metadados?
Dados sobre dados. Basicamente, descreve o tipo de informação que um dado subjacente conterá.
Você pode usar um dos seguintes formatos: JSON, YAML, XML. Qual você usaria? Por quê?
Não posso responder isso por você :)
O que é KPI?
O que é OKR?
O que é DSL (Domain Specific Language)?
Linguagens Específicas de Domínio (DSLs) são usadas para criar uma linguagem personalizada que representa o domínio de tal forma que especialistas do domínio possam interpretá-la facilmente.
Qual é a diferença entre KPI e OKR?
#### YAML
O que é YAML?
Linguagem de serialização de dados usada por muitas tecnologias hoje como Kubernetes, Ansible, etc.
Verdadeiro ou Falso? Qualquer arquivo JSON válido também é um arquivo YAML válido
Verdadeiro. Porque YAML é um superconjunto de JSON.
Qual é o formato dos seguintes dados? ``` { "applications": [ { "name": "my_app", "language": "python", "version": 20.17 } ] } ```
JSON
Qual é o formato dos seguintes dados? ``` applications: - app: "my_app" language: "python" version: 20.17 ```
YAML
Como escrever uma string de várias linhas com YAML? Para quais casos de uso é bom?
``` someMultiLineString: | olha mamãe Eu posso escrever uma string de várias linhas Eu amo YAML ``` É bom para casos de uso como escrever um script de shell onde cada linha do script é um comando diferente.
Qual é a diferença entre someMultiLineString: | e someMultiLineString: >?
usar `>` fará com que a string de várias linhas se dobre em uma única linha ``` someMultiLineString: > Isso é na verdade uma única linha não se deixe enganar pelas aparências ```
O que são placeholders em YAML?
Eles permitem que você referencie valores em vez de escrevê-los diretamente e é usado assim: ``` username: {{ my.user_name }} ```
Como você pode definir vários componentes YAML em um arquivo?
Usando isto: `---` Por exemplo: ``` document_number: 1 --- document_number: 2 ```
#### Firmware
Explique o que é um firmware
Wikipedia: "Em computação, firmware é uma classe específica de software de computador que fornece o controle de baixo nível para o hardware específico de um dispositivo. Firmware, como o BIOS de um computador pessoal, pode conter funções básicas de um dispositivo e pode fornecer serviços de abstração de hardware para software de nível superior, como sistemas operacionais."
## Cassandra
Ao executar um cluster cassandra, com que frequência você precisa executar o nodetool repair para manter o cluster consistente? * Dentro do GC-grace da columnFamily uma vez por semana * Menos que o mínimo de bytes da partição compactada * Depende da estratégia de compactação
## HTTP
O que é HTTP?
Avinetworks: HTTP significa Hypertext Transfer Protocol. O HTTP usa a porta TCP 80 para permitir a comunicação na internet. Faz parte da Camada de Aplicação (L7) no Modelo OSI.
Descreva o ciclo de vida da solicitação HTTP
* Resolver host por solicitação ao resolvedor DNS * Cliente SYN * Servidor SYN+ACK * Cliente SYN * Solicitação HTTP * Resposta HTTP
Verdadeiro ou Falso? HTTP é stateful
Falso. Ele не mantém o estado para a solicitação recebida.
Como é uma solicitação HTTP?
Consiste em: * Linha de solicitação - tipo de solicitação * Cabeçalhos - informações de conteúdo como comprimento, codificação, etc. * Corpo (nem sempre incluído)
Quais tipos de método HTTP existem?
* GET * POST * HEAD * PUT * DELETE * CONNECT * OPTIONS * TRACE
Quais códigos de resposta HTTP existem?
* 1xx - informativo * 2xx - Sucesso * 3xx - Redirecionamento * 4xx - Erro, falha do cliente * 5xx - Erro, falha do servidor
O que é HTTPS?
HTTPS é uma versão segura do protocolo HTTP usada para transferir dados entre um navegador da web e um servidor da web. Ele criptografa a comunicação usando criptografia SSL/TLS para garantir que os dados sejam privados e seguros. Saiba mais: https://www.cloudflare.com/learning/ssl/why-is-http-not-secure/
Explique os Cookies HTTP
HTTP é stateless. Para compartilhar estado, podemos usar Cookies. TODO: explicar o que é realmente um Cookie
O que é Pipelining HTTP?
Você recebe o erro "504 Gateway Timeout" de um servidor HTTP. O que isso significa?
O servidor não recebeu uma resposta de outro servidor com o qual se comunica em tempo hábil.
O que é um proxy?
Um proxy é um servidor que atua como intermediário entre um dispositivo cliente e um servidor de destino. Ele pode ajudar a melhorar a privacidade, segurança e desempenho, ocultando o endereço IP do cliente, filtrando conteúdo e armazenando em cache dados acessados com frequência. - Os proxies podem ser usados para balanceamento de carga, distribuindo o tráfego por vários servidores para ajudar a evitar a sobrecarga do servidor и melhorar o desempenho do site ou da aplicação. Eles também podem ser usados para análise de dados, pois podem registrar solicitações e tráfego, fornecendo insights úteis sobre o comportamento e as preferências do usuário.
O que é um proxy reverso?
Um proxy reverso é um tipo de servidor proxy que fica entre um cliente e um servidor, mas é usado para gerenciar o tráfego que vai na direção oposta de um proxy de encaminhamento tradicional. Em um proxy de encaminhamento, o cliente envia solicitações para o servidor proxy, que as encaminha para o servidor de destino. No entanto, em um proxy reverso, o cliente envia solicitações para o servidor de destino, mas as solicitações são interceptadas pelo proxy reverso antes de chegarem ao servidor. - Eles são comumente usados para melhorar o desempenho do servidor web, fornecer alta disponibilidade e tolerância a falhas e aprimorar a segurança, impedindo o acesso direto ao servidor de back-end. Eles são frequentemente usados em aplicações web de grande escala e sites de alto tráfego para gerenciar e distribuir solicitações para vários servidores, resultando em melhor escalabilidade e confiabilidade.
Quando você publica um projeto, geralmente o publica com uma licença. Que tipos de licenças você conhece e qual prefere usar?
Explique o que é "X-Forwarded-For"
Wikipedia: "O campo de cabeçalho HTTP X-Forwarded-For (XFF) é um método comum para identificar o endereço IP de origem de um cliente que se conecta a um servidor web através de um proxy HTTP ou balanceador de carga."
#### Balanceadores de Carga
O que é um balanceador de carga?
Um balanceador de carga aceita (ou nega) o tráfego de rede de entrada de um cliente e, com base em alguns critérios (relacionados à aplicação, rede, etc.), distribui essas comunicações para os servidores (pelo menos um).
Por que usar um balanceador de carga?
* Escalabilidade - usando um balanceador de carga, você pode possivelmente adicionar mais servidores no backend para lidar com mais solicitações/tráfego dos clientes, em oposição ao uso de um servidor. * Redundância - se um servidor no backend morrer, o balanceador de carga continuará encaminhando o tráfego/solicitações para o segundo servidor para que os usuários nem percebam que um dos servidores no backend está inativo.
Quais técnicas/algoritmos de balanceamento de carga você conhece?
* Round Robin * Round Robin Ponderado * Menor Conexão * Menor Conexão Ponderada * Baseado em Recurso * Ponderação Fixa * Tempo de Resposta Ponderado * Hash de IP de Origem * Hash de URL
Quais são as desvantagens do algoritmo round robin no balanceamento de carga?
* Um algoritmo round robin simples não sabe nada sobre a carga e a especificação de cada servidor para o qual encaminha as solicitações. É possível que várias solicitações de cargas de trabalho pesadas cheguem ao mesmo servidor, enquanto outros servidores receberão apenas solicitações leves, o que resultará em um servidor fazendo a maior parte do trabalho, talvez até travando em algum momento porque não consegue lidar com todas as solicitações de cargas de trabalho pesadas sozinho. * Cada solicitação do cliente cria uma sessão totalmente nova. Isso pode ser um problema para certos cenários em que você gostaria de realizar várias operações onde o servidor precisa saber sobre o resultado da operação, basicamente, sendo uma espécie de ciente do histórico que tem com o cliente. No round robin, a primeira solicitação pode atingir o servidor X, enquanto a segunda solicitação pode atingir o servidor Y e pedir para continuar processando os dados que já foram processados no servidor X.
O que é um Balanceador de Carga de Aplicação?
Em que cenários você usaria o ALB?
Em quais camadas um balanceador de carga pode operar?
L4 e L7
Você pode realizar balanceamento de carga sem usar uma instância de balanceador de carga dedicada?
Sim, você pode usar o DNS para realizar o balanceamento de carga.
O que é balanceamento de carga DNS? Quais são suas vantagens? Quando você o usaria?
#### Balanceadores de Carga - Sessões Fixas (Sticky Sessions)
O que são sessões fixas? Quais são seus prós e contras?
Leitura recomendada: * Artigo da Red Hat Contras: * Pode causar carga desigual na instância (já que as solicitações são roteadas para as mesmas instâncias) Prós: * Garante que as sessões em processo não sejam perdidas quando uma nova solicitação é criada
Nomeie um caso de uso para usar sessões fixas
Você gostaria de garantir que o usuário não perca os dados da sessão atual.
O que as sessões fixas usam para habilitar a "fixação"?
Cookies. Existem cookies baseados em aplicação e cookies baseados em duração.
Explique os cookies baseados em aplicação
* Gerado pela aplicação e/ou pelo balanceador de carga * Geralmente permite incluir dados personalizados
Explique os cookies baseados em duração
* Gerado pelo balanceador de carga * A sessão não é mais fixa quando a duração expira
#### Balanceadores de Carga - Algoritmos de Balanceamento de Carga
Explique cada uma das seguintes técnicas de balanceamento de carga * Round Robin * Round Robin Ponderado * Menor Conexão * Menor Conexão Ponderada * Baseado em Recurso * Ponderação Fixa * Tempo de Resposta Ponderado * Hash de IP de Origem * Hash de URL
Explique o caso de uso para drenagem de conexão?
Para garantir que um Balanceador de Carga Clássico pare de enviar solicitações para instâncias que estão sendo desregistradas ou não saudáveis, enquanto mantém as conexões existentes abertas, use a drenagem de conexão. Isso permite que o balanceador de carga conclua as solicitações em andamento feitas para instâncias que estão sendo desregistradas ou não saudáveis. O valor máximo do tempo limite pode ser definido entre 1 e 3.600 segundos tanto no GCP quanto na AWS.
#### Licenças
Você está familiarizado com "Creative Commons"? O que você sabe sobre isso?
A licença Creative Commons é um conjunto de licenças de direitos autorais que permitem aos criadores compartilhar seu trabalho com o público, mantendo algum controle sobre como ele pode ser usado. A licença foi desenvolvida como uma resposta aos padrões restritivos das leis de direitos autorais tradicionais, que limitavam o acesso a obras criativas. Ela permite que os criadores escolham os termos sob os quais suas obras podem ser compartilhadas, distribuídas e usadas por outros. Existem seis tipos principais de licenças Creative Commons, cada uma com diferentes níveis de restrições e permissões, as seis licenças são: * Atribuição (CC BY): Permite que outros distribuam, remixem e criem a partir do trabalho, mesmo comercialmente, desde que creditem o criador original. * Atribuição-CompartilhaIgual (CC BY-SA): Permite que outros remixem e criem a partir do trabalho, mesmo comercialmente, desde que creditem o criador original e liberem quaisquer novas criações sob a mesma licença. * Atribuição-SemDerivações (CC BY-ND): Permite que outros distribuam o trabalho, mesmo comercialmente, mas não podem remixá-lo ou alterá-lo de forma alguma e devem creditar o criador original. * Atribuição-NãoComercial (CC BY-NC): Permite que outros remixem e criem a partir do trabalho, mas não podem usá-lo comercialmente e devem creditar o criador original. * Atribuição-NãoComercial-CompartilhaIgual (CC BY-NC-SA): Permite que outros remixem e criem a partir do trabalho, mas não podem usá-lo comercialmente, devem creditar o criador original e devem liberar quaisquer novas criações sob a mesma licença. * Atribuição-NãoComercial-SemDerivações (CC BY-NC-ND): Permite que outros baixem e compartilhem o trabalho, mas não podem usá-lo comercialmente, remixá-lo ou alterá-lo de forma alguma e devem creditar o criador original. Simplificando, as licenças Creative Commons são uma forma de os criadores compartilharem seu trabalho com o público, mantendo algum controle sobre como ele pode ser usado. As licenças promovem a criatividade, a inovação e a colaboração, ao mesmo tempo que respeitam os direitos dos criadores e incentivam o uso responsável de obras criativas. Mais informações: https://creativecommons.org/licenses/
Explique as diferenças entre licenças copyleft e permissivas
No Copyleft, qualquer trabalho derivado deve usar o mesmo licenciamento, enquanto no licenciamento permissivo não existe tal condição. A GPL-3 é um exemplo de licença copyleft, enquanto a BSD é um exemplo de licença permissiva.
#### Aleatório
Como funciona um motor de busca?
Como funciona o preenchimento automático?
O que é mais rápido que a RAM?
Cache da CPU. Fonte
O que é um vazamento de memória?
Um vazamento de memória é um erro de programação que ocorre quando um programa falha em liberar a memória que não é mais necessária, fazendo com que o programa consuma quantidades crescentes de memória ao longo do tempo. Os vazamentos podem levar a uma variedade de problemas, incluindo travamentos do sistema, degradação do desempenho e instabilidade. Geralmente ocorrem após manutenção malsucedida em sistemas mais antigos e compatibilidade com novos componentes ao longo do tempo.
Qual é o seu protocolo favorito?
SSH HTTP DHCP DNS ...
O que é a API de Cache?
O que é o problema C10K? É relevante hoje?
https://idiallo.com/blog/c10k-2016
## Armazenamento
Que tipos de armazenamento existem?
* Arquivo * Bloco * Objeto
Explique o Armazenamento de Objetos
- Os dados são divididos em objetos autocontidos - Os objetos podem conter metadados
Quais são os prós e contras do armazenamento de objetos?
Prós: - Geralmente com o armazenamento de objetos, você paga pelo que usa, em oposição a outros tipos de armazenamento onde você paga pelo espaço de armazenamento que aloca - Armazenamento escalável: O armazenamento de objetos é baseado principalmente em um modelo onde o que você usa, é o que você obtém e você pode adicionar armazenamento conforme necessário Contras: - Geralmente tem um desempenho mais lento do que outros tipos de armazenamento - Sem modificação granular: para alterar um objeto, você precisa recriá-lo
Quais são alguns casos de uso para usar o armazenamento de objetos?
Explique o Armazenamento de Arquivos
- O Armazenamento de Arquivos é usado para armazenar dados em arquivos, em uma estrutura hierárquica - Alguns dos dispositivos para armazenamento de arquivos: disco rígido, pen drive, armazenamento de arquivos baseado em nuvem - Os arquivos geralmente são organizados em diretórios
Quais são os prós e contras do Armazenamento de Arquivos?
Prós: - Os usuários têm controle total de seus próprios arquivos e podem executar uma variedade de operações nos arquivos: excluir, ler, escrever e mover. - O mecanismo de segurança permite que os usuários tenham um controle melhor em coisas como bloqueio de arquivos
Quais são alguns exemplos de armazenamento de arquivos?
Sistema de arquivos local Dropbox Google Drive
Que tipos de dispositivos de armazenamento existem?
Explique o IOPS
Explique o throughput de armazenamento
O que é um sistema de arquivos?
Um sistema de arquivos é uma maneira pela qual computadores e outros dispositivos eletrônicos organizam e armazenam arquivos de dados. Ele fornece uma estrutura que ajuda a organizar os dados em arquivos e diretórios, facilitando a localização e o gerenciamento de informações. Um sistema de arquivos é crucial para fornecer uma maneira de armazenar e gerenciar dados de maneira organizada. Sistemas de arquivos comumente usados: Windows: * NTFS * exFAT Mac OS: * HFS+ *APFS
Explique os Dados Escuros (Dark Data)
Explique o MBR
## Perguntas que VOCÊ PODE fazer Uma lista de perguntas que você, como candidato, pode fazer ao entrevistador durante ou após a entrevista. Estas são apenas sugestões, use-as com cuidado. Nem todo entrevistador será capaz de respondê-las (ou ficará feliz em fazê-lo), o que talvez deva ser um sinal de alerta para você em relação a trabalhar em tal lugar, mas isso realmente depende de você.
O que você gosta em trabalhar aqui?
Como a empresa promove o crescimento pessoal?
Qual é o nível atual de dívida técnica com que vocês estão lidando?
Tenha cuidado ao fazer esta pergunta - todas as empresas, independentemente do tamanho, têm algum nível de dívida técnica. Formule a pergunta à luz de que todas as empresas têm que lidar com isso, mas você quer ver os pontos problemáticos atuais com os quais eles estão lidando
Esta é uma ótima maneira de descobrir como os gerentes lidam com o trabalho não planejado e quão bons eles são em definir expectativas com os projetos.
Por que eu NÃO deveria me juntar a vocês? (ou 'o que você não gosta em trabalhar aqui?')
Qual foi o seu projeto favorito em que você trabalhou?
Isso pode lhe dar insights sobre alguns dos projetos legais em que uma empresa está trabalhando, e se você gostaria de trabalhar em projetos como esses. Esta também é uma boa maneira de ver se os gerentes estão permitindo que os funcionários aprendam e cresçam com projetos fora do trabalho normal que você faria.
Se você pudesse mudar uma coisa no seu dia a dia, o que seria?
Semelhante à pergunta sobre dívida técnica, isso ajuda a identificar quaisquer pontos problemáticos na empresa. Além disso, pode ser uma ótima maneira de mostrar como você seria um ativo para a equipe.
Por exemplo, se eles mencionarem que têm o problema X, e você já resolveu isso no passado, você pode mostrar como seria capaz de mitigar esse problema.
Digamos que concordamos e você me contrata para esta posição, após X meses, o que você espera que eu tenha alcançado?
Isso não apenas lhe dirá o que é esperado de você, mas também fornecerá uma grande dica sobre o tipo de trabalho que você fará nos primeiros meses de seu trabalho.
## Testes
Explique o teste de caixa-branca
Explique o teste de caixa-preta
O que são testes unitários?
Testes unitários são uma técnica de teste de software que envolve a quebra sistemática de um sistema e o teste de cada parte individual da montagem. Esses testes são automatizados e podem ser executados repetidamente para permitir que os desenvolvedores capturem cenários de borda ou bugs rapidamente durante o desenvolvimento. O principal objetivo dos testes unitários é verificar se cada função está produzindo saídas adequadas, dado um conjunto de entradas.
Que tipos de testes você executaria para testar uma aplicação web?
Explique o arnês de teste (test harness)?
O que é teste A/B?
O que é simulação de rede e como você a realiza?
Que tipos de testes de desempenho você conhece?
Explique os seguintes tipos de testes: * Teste de Carga * Teste de Estresse * Teste de Capacidade * Teste de Volume * Teste de Resistência
## Regex Dado um arquivo de texto, realize os seguintes exercícios #### Extrair
Extraia todos os números
- "\d+"
Extraia a primeira palavra de cada linha
- "^\w+" Bônus: extraia a última palavra de cada linha - "\w+(?=\W*$)" (na maioria dos casos, depende da formatação da linha)
Extraia todos os endereços IP
- "\b(?:\d{1,3}\ .){3}\d{1,3}\b" IPV4:(Este formato procura por uma sequência de 1 a 3 dígitos 3 vezes)
Extraia datas no formato aaaa-mm-dd ou aaaa-dd-mm
Extraia endereços de e-mail
- "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\ .[A-Za-z]{2,}\b"
#### Substituir
Substitua tabulações por quatro espaços
Substitua 'vermelho' por 'verde'
## Design de Sistemas
Explique o que é um "ponto único de falha".
Um "ponto único de falha", em um sistema ou organização, se falhasse, causaria a falha de todo o sistema ou perturbaria significativamente sua operação. Em outras palavras, é uma vulnerabilidade onde não há backup para compensar a falha.
O que é CDN?
CDN (Content Delivery Network) responsável por distribuir conteúdo geograficamente. Parte disso, é o que é conhecido como locais de borda, também conhecidos como proxies de cache, que permitem aos usuários obter seu conteúdo rapidamente devido a recursos de cache e distribuição geográfica.
Explique Multi-CDN
Em CDN único, todo o conteúdo é originado da rede de entrega de conteúdo.
Em multi-CDN, o conteúdo é distribuído por várias CDNs diferentes, cada uma podendo estar em um provedor/nuvem completamente diferente.
Quais são os benefícios do Multi-CDN em relação a um único CDN?
* Resiliência: Confiar em um CDN significa nenhuma redundância. Com vários CDNs, você não precisa se preocupar com a queda do seu CDN * Flexibilidade nos Custos: Usar um CDN o força a taxas específicas desse CDN. Com vários CDNs, você pode levar em consideração o uso de CDNs menos caros para entregar o conteúdo. * Desempenho: Com Multi-CDN, há um potencial maior na escolha de melhores locais que estejam mais próximos do cliente que solicita o conteúdo * Escala: Com vários CDNs, você pode escalar serviços para suportar condições mais extremas
Explique a "Arquitetura de 3 Camadas" (incluindo prós e contras)
Uma "Arquitetura de 3 Camadas" é um padrão usado no desenvolvimento de software para projetar e estruturar aplicações. Ela divide a aplicação em 3 camadas interconectadas: Apresentação, Lógica de Negócios e Armazenamento de Dados. PRÓS: * Escalabilidade * Segurança * Reutilização CONTRAS: * Complexidade * Sobrecarga de desempenho * Custo e tempo de desenvolvimento
Explique Mono-repo vs. Multi-repo. Quais são os contras e prós de cada abordagem?
Em um Mono-repo, todo o código de uma organização é armazenado em um único repositório centralizado. PRÓS (Mono-repo): * Ferramentas unificadas * Compartilhamento de código CONTRAS (Mono-repo): * Complexidade aumentada * Clonagem mais lenta Em uma configuração Multi-repo, cada componente é armazenado em seu próprio repositório separado. Cada repositório tem seu próprio histórico de controle de versão. PRÓS (Multi-repo): * Mais simples de gerenciar * Diferentes equipes e desenvolvedores podem trabalhar em diferentes partes do projeto de forma independente, facilitando o desenvolvimento paralelo. CONTRAS (Multi-repo): * Duplicação de código * Desafios de integração
Quais são as desvantagens da arquitetura monolítica?
* Não é adequada para mudanças frequentes de código e a capacidade de implantar novos recursos * Não foi projetada para a infraestrutura de hoje (como nuvens públicas) * Escalar uma equipe para trabalhar em uma arquitetura monolítica é mais desafiador * Se um único componente nesta arquitetura falhar, toda a aplicação falha.
Quais são as vantagens da arquitetura de microsserviços sobre uma arquitetura monolítica?
* Cada um dos serviços falha individualmente sem escalar para uma interrupção em toda a aplicação. * Cada serviço pode ser desenvolvido e mantido por uma equipe separada e essa equipe pode escolher suas próprias ferramentas e linguagem de codificação
O que é uma malha de serviço (service mesh)?
É uma camada que facilita o gerenciamento e o controle da comunicação entre microsserviços em uma aplicação conteinerizada. Ela lida com tarefas como balanceamento de carga, criptografia e monitoramento.
Explique "Acoplamento Fraco"
Em "Acoplamento Fraco", os componentes de um sistema se comunicam com pouco conhecimento do funcionamento interno um do outro. Isso melhora a escalabilidade e a facilidade de modificação em sistemas complexos.
O que é uma fila de mensagens? Quando é usada?
É um mecanismo de comunicação usado em sistemas distribuídos para permitir a comunicação assíncrona entre diferentes componentes. Geralmente é usado quando os sistemas usam uma abordagem de microsserviços.
#### Escalabilidade
Explique Escalabilidade
A capacidade de crescer facilmente em tamanho e capacidade com base na demanda e no uso.
Explique Elasticidade
A capacidade de crescer, mas também de reduzir com base no que é necessário
Explique a Recuperação de Desastres
A recuperação de desastres é o processo de restaurar sistemas e dados críticos de negócios após um evento disruptivo. O objetivo é minimizar o impacto e retomar as atividades normais de negócios rapidamente. Isso envolve a criação de um plano, testá-lo, fazer backup de dados críticos e armazená-los em locais seguros. Em caso de desastre, o plano é então executado, os backups são restaurados e os sistemas são, esperançosamente, trazidos de volta online. O processo de recuperação pode levar horas ou dias, dependendo dos danos à infraestrutura. Isso torna o planejamento de negócios importante, pois um plano de recuperação de desastres bem projetado e testado pode minimizar o impacto de um desastre e manter as operações em andamento.
Explique Tolerância a Falhas e Alta Disponibilidade
Tolerância a Falhas - A capacidade de se auto-curar e retornar à capacidade normal. Também a capacidade de resistir a uma falha e permanecer funcional. Alta Disponibilidade - Ser capaz de acessar um recurso (em alguns casos de uso, usando diferentes plataformas)
Qual é a diferença entre alta disponibilidade e Recuperação de Desastres?
wintellect.com: "Alta disponibilidade, simplesmente, é eliminar pontos únicos de falha e a recuperação de desastres é o processo de levar um sistema de volta a um estado operacional quando um sistema é tornado inoperante. Em essência, a recuperação de desastres assume quando a alta disponibilidade falha, então HA primeiro."
Explique o Escalonamento Vertical
O Escalonamento Vertical é o processo de adicionar recursos para aumentar a potência dos servidores existentes. Por exemplo, adicionar mais CPUs, adicionar mais RAM, etc.
Quais são as desvantagens do Escalonamento Vertical?
Apenas com o escalonamento vertical, o componente ainda permanece um ponto único de falha. Além disso, ele tem um limite de hardware onde, se você não tiver mais recursos, pode não ser capaz de escalar verticalmente.
Que tipo de serviços em nuvem geralmente suportam escalonamento vertical?
Bancos de dados, cache. É comum principalmente para sistemas não distribuídos.
Explique o Escalonamento Horizontal
O Escalonamento Horizontal é o processo de adicionar mais recursos que serão capazes de lidar com solicitações como uma única unidade
Qual é a desvantagem do Escalonamento Horizontal? O que é frequentemente necessário para realizar o Escalonamento Horizontal?
Um balanceador de carga. Você pode adicionar mais recursos, mas se quiser que eles façam parte do processo, você precisa servir a eles as solicitações/respostas. Além disso, a inconsistência de dados é uma preocupação com o escalonamento horizontal.
Explique em quais casos de uso você usará o escalonamento vertical e em quais casos de uso você usará o escalonamento horizontal
Explique a Resiliência e quais maneiras existem para tornar um sistema mais resiliente
Explique o "Hashing Consistente"
Como você atualizaria cada um dos serviços no desenho a seguir sem ter tempo de inatividade do aplicativo (foo.com)?

Qual é o problema com a seguinte arquitetura e como você a consertaria?

A carga sobre os produtores ou consumidores pode ser alta, o que pode fazer com que eles travem ou falhem.
Em vez de trabalhar no "modo push", os consumidores podem puxar tarefas apenas quando estiverem prontos para lidar com elas. Isso pode ser corrigido usando uma plataforma de streaming como Kafka, Kinesis, etc. Esta plataforma garantirá o tratamento da alta carga/tráfego e passará tarefas/mensagens para os consumidores apenas quando estiverem prontos para recebê-las.
Os usuários relatam que há um pico enorme no tempo de processo ao adicionar um pouco mais de dados para processar como entrada. Qual pode ser o problema?

Como você escalaria a arquitetura da pergunta anterior para centenas de usuários?
#### Cache
O que é "cache"? Em que casos você o usaria?
O que é "cache distribuído"?
O que é uma "política de substituição de cache"?
Dê uma olhada aqui
Quais políticas de substituição de cache você conhece?
Você pode encontrar uma lista aqui
Explique as seguintes políticas de cache: * FIFO * LIFO * LRU
Leia sobre isso aqui
Por que não escrever tudo no cache em vez de em um banco de dados/datastore?
Cache e bancos de dados servem a propósitos diferentes e são otimizados para casos de uso diferentes. O cache é usado para acelerar as operações de leitura, armazenando dados acessados com frequência na memória ou em um meio de armazenamento rápido. Ao manter os dados próximos à aplicação, o cache reduz a latência e a sobrecarga de acessar dados de um sistema de armazenamento mais lento e distante, como um banco de dados ou disco. Por outro lado, os bancos de dados são otimizados para armazenar e gerenciar dados persistentes. Os bancos de dados são projetados para lidar com operações simultâneas de leitura e escrita, impor restrições de consistência e integridade e fornecer recursos como indexação e consulta.
#### Migrações
Como você se prepara para uma migração? (ou planeja uma migração)
Você pode mencionar: roll-back & roll-forward cut over ensaios gerais redirecionamento de DNS
Explique a técnica "Branch by Abstraction"
#### Projete um sistema
Você pode projetar um site de streaming de vídeo?
Você pode projetar um site de upload de fotos?
Como você construiria um encurtador de URL?
#### Mais Perguntas de Design de Sistemas Exercícios adicionais podem ser encontrados no repositório system-design-notebook.

## Hardware
O que é uma CPU?
Uma unidade central de processamento (CPU) realiza operações básicas de aritmética, lógica, controle e entrada/saída (E/S) especificadas pelas instruções no programa. Isso contrasta com componentes externos, como memória principal e circuitos de E/S, e processadores especializados, como unidades de processamento gráfico (GPUs).
O que é RAM?
RAM (Random Access Memory) é o hardware em um dispositivo de computação onde o sistema operacional (SO), programas de aplicação e dados em uso atual são mantidos para que possam ser rapidamente alcançados pelo processador do dispositivo. A RAM é a memória principal em um computador. É muito mais rápido ler e escrever nela do que em outros tipos de armazenamento, como uma unidade de disco rígido (HDD), unidade de estado sólido (SSD) ou unidade óptica.
O que é uma GPU?
Uma GPU, ou Unidade de Processamento Gráfico, é um circuito eletrônico especializado projetado para acelerar o processamento de imagens e vídeos para exibição em uma tela de computador.
O que é um sistema embarcado?
Um sistema embarcado é um sistema de computador - uma combinação de um processador de computador, memória de computador e dispositivos periféricos de entrada/saída - que tem uma função dedicada dentro de um sistema mecânico ou eletrônico maior. Ele é embarcado como parte de um dispositivo completo, muitas vezes incluindo hardware elétrico ou eletrônico e peças mecânicas.
Você pode dar um exemplo de um sistema embarcado?
Um exemplo comum de um sistema embarcado é o painel de controle digital de um forno de micro-ondas, que é gerenciado por um microcontrolador. Quando comprometido com um determinado objetivo, o Raspberry Pi pode servir como um sistema embarcado.
Que tipos de armazenamento existem?
Existem vários tipos de armazenamento, incluindo unidades de disco rígido (HDDs), unidades de estado sólido (SSDs) e unidades ópticas (CD/DVD/Blu-ray). Outros tipos de armazenamento incluem pen drives USB, cartões de memória e armazenamento conectado à rede (NAS).
Quais são algumas considerações que as equipes de DevOps devem ter em mente ao selecionar hardware para seu trabalho?
A escolha do hardware DevOps certo é essencial para garantir pipelines de CI/CD simplificados, loops de feedback oportunos e disponibilidade de serviço consistente. Aqui está um guia destilado sobre o que as equipes de DevOps devem considerar: 1. **Entendendo as Cargas de Trabalho**: - **CPU**: Considere a necessidade de CPUs multi-core ou de alta frequência com base em suas tarefas. - **RAM**: Memória suficiente é vital para atividades como codificação em grande escala ou automação intensiva. - **Armazenamento**: Avalie a velocidade e a capacidade de armazenamento. SSDs podem ser preferíveis para operações rápidas. 2. **Expansibilidade**: - **Crescimento Horizontal**: Verifique se você pode aumentar a capacidade adicionando mais dispositivos. - **Crescimento Vertical**: Determine se as atualizações (como RAM, CPU) para máquinas individuais são viáveis. 3. **Considerações de Conectividade**: - **Transferência de Dados**: Garanta conexões de rede de alta velocidade para atividades como recuperação de código e transferências de dados. - **Velocidade**: Vise redes de baixa latência, particularmente importantes para tarefas distribuídas. - **Rotas de Backup**: Pense em ter rotas de rede de backup para evitar tempos de inatividade. 4. **Tempo de Atividade Consistente**: - Planeje backups de hardware como configurações RAID, fontes de energia de backup ou conexões de rede alternativas para garantir o serviço contínuo. 5. **Compatibilidade do Sistema**: - Certifique-se de que seu hardware esteja alinhado com seu software, sistema operacional e plataformas pretendidas. 6. **Eficiência Energética**: - Hardware que usa energia de forma eficiente pode reduzir custos a longo prazo, especialmente em grandes configurações. 7. **Medidas de Segurança**: - Explore recursos de segurança em nível de hardware, como TPM, para aprimorar a proteção. 8. **Supervisão & Controle**: - Ferramentas como ILOM podem ser benéficas para o manuseio remoto. - Certifique-se de que o hardware possa ser monitorado sem problemas para saúde e desempenho. 9. **Orçamento**: - Considere tanto as despesas iniciais quanto os custos de longo prazo ao orçar. 10. **Suporte & Comunidade**: - Escolha hardware de fornecedores respeitáveis conhecidos por suporte confiável. - Verifique se há drivers, atualizações e discussões da comunidade disponíveis sobre o hardware. 11. **Planejamento Futuro**: - Opte por hardware que possa atender tanto aos requisitos presentes quanto aos futuros. 12. **Ambiente Operacional**: - **Controle de Temperatura**: Garanta sistemas de refrigeração para gerenciar o calor de unidades de alto desempenho. - **Gerenciamento de Espaço**: Avalie o tamanho do hardware considerando o espaço de rack disponível. - **Energia Confiável**: Leve em conta fontes de energia consistentes e de backup. 13. **Coordenação com a Nuvem**: - Se você está inclinado a uma configuração de nuvem híbrida, concentre-se em como o hardware local se integrará com os recursos da nuvem. 14. **Vida Útil do Hardware**: - Esteja ciente da duração esperada do hardware e quando você pode precisar de substituições ou atualizações. 15. **Otimizado para Virtualização**: - Se estiver utilizando máquinas virtuais ou contêineres, garanta que o hardware seja compatível e otimizado para tais cargas de trabalho. 16. **Adaptabilidade**: - O hardware modular permite substituições de componentes individuais, oferecendo mais flexibilidade. 17. **Evitando a Dependência de um Único Fornecedor**: - Tente evitar a dependência de um único fornecedor, a menos que haja vantagens claras. 18. **Escolhas Ecológicas**: - Priorize hardware produzido de forma sustentável, que seja eficiente em termos de energia e ambientalmente responsável. Em essência, as equipes de DevOps devem escolher um hardware que seja compatível com suas tarefas, versátil, que ofereça bom desempenho e que se mantenha dentro do orçamento. Além disso, considerações de longo prazo, como manutenção, atualizações potenciais e compatibilidade com mudanças tecnológicas iminentes, devem ser priorizadas.
Qual é o papel do hardware no planejamento e implementação da recuperação de desastres?
O hardware é crítico nas soluções de recuperação de desastres (DR). Embora o escopo mais amplo do DR inclua coisas como procedimentos padrão, normas e papéis humanos, é o hardware que mantém os processos de negócios funcionando sem problemas. Aqui está um esboço de como o hardware funciona com o DR: 1. **Armazenando Dados e Garantindo sua Duplicação**: - **Equipamento de Backup**: Dispositivos como armazenamento em fita, servidores de backup e HDDs externos mantêm dados essenciais armazenados com segurança em um local diferente. - **Arrays de Disco**: Sistemas como RAID oferecem uma rede de segurança. Se um disco falhar, os outros compensam. 2. **Sistemas Alternativos para Recuperação**: - **Servidores de Backup**: Estes entram em ação quando os servidores principais falham, mantendo o fluxo de serviço. - **Distribuidores de Tráfego**: Dispositivos como balanceadores de carga compartilham o tráfego entre os servidores. Se um servidor falhar, eles redirecionam os usuários para os operacionais. 3. **Hubs de Operação Alternativos**: - **Centros Prontos para Uso**: Locais equipados e preparados para assumir o comando imediatamente quando o centro principal falha. - **Instalações Básicas**: Locais com equipamentos necessários, mas sem dados recentes, levando mais tempo para ativar. - **Instalações Semi-preparadas**: Locais um tanto preparados com sistemas e dados selecionados, levando uma duração moderada para ativar. 4. **Mecanismos de Backup de Energia**: - **Backup de Energia Instantâneo**: Dispositivos como no-breaks (UPS) oferecem energia durante breves interrupções, garantindo que não haja desligamentos abruptos. - **Soluções de Energia de Longo Prazo**: Geradores mantêm sistemas vitais operacionais durante perdas de energia prolongadas. 5. **Equipamento de Rede**: - **Conexões de Internet de Backup**: Ter alternativas garante a conectividade mesmo que um provedor enfrente problemas. - **Ferramentas de Conexão Segura**: Dispositivos que garantem acesso remoto seguro, especialmente crucial durante situações de DR. 6. **Configuração Física no Local**: - **Alojamento Organizado**: Estruturas como racks para armazenar e gerenciar hardware de forma organizada. - **Controle de Temperatura de Emergência**: Mecanismos de refrigeração de backup para combater o superaquecimento do servidor em caso de mau funcionamento do HVAC. 7. **Canais de Comunicação Alternativos**: - **Telefones baseados em Órbita**: Úteis quando os métodos de comunicação regulares falham. - **Dispositivos de Comunicação Direta**: Dispositivos como rádios úteis quando os sistemas primários estão inoperantes. 8. **Mecanismos de Proteção**: - **Barreiras Eletrônicas & Sistemas de Alerta**: Dispositivos como firewalls e detecção de intrusão mantêm os sistemas de DR protegidos. - **Controle de Entrada Física**: Sistemas que controlam a entrada e monitoram, garantindo que apenas pessoal autorizado tenha acesso. 9. **Uniformidade e Compatibilidade no Hardware**: - É mais simples gerenciar e substituir equipamentos em emergências se as configurações de hardware forem consistentes e compatíveis. 10. **Equipamento para Testes e Manutenção**: - Os exercícios de DR podem usar equipamentos específicos para garantir que os sistemas primários permaneçam inalterados. Isso verifica a prontidão e a capacidade do equipamento para gerenciar crises reais. Em resumo, embora o software e as intervenções humanas sejam importantes nas operações de recuperação de desastres, é o hardware que fornece o suporte subjacente. É fundamental para planos eficientes de recuperação de desastres manter este hardware resiliente, duplicado e rotineiramente avaliado.
O que é um RAID?
RAID é um acrônimo que significa "Redundant Array of Independent Disks" (Arranjo Redundante de Discos Independentes). É uma técnica que combina vários discos rígidos em um único dispositivo conhecido como array para melhorar o desempenho, expandir a capacidade de armazenamento e/ou oferecer redundância para evitar a perda de dados. Os níveis de RAID (por exemplo, RAID 0, RAID 1 e RAID 5) oferecem benefícios variados em termos de desempenho, redundância e eficiência de armazenamento.
O que é um microcontrolador?
Um microcontrolador é um pequeno circuito integrado que controla certas tarefas em um sistema embarcado. Ele normalmente inclui uma CPU, memória e periféricos de entrada/saída.
O que é um Controlador de Interface de Rede ou NIC?
Um Controlador de Interface de Rede (NIC) é uma peça de hardware que conecta um computador a uma rede e permite que ele se comunique com outros dispositivos.
O que é um DMA?
O acesso direto à memória (DMA) é um recurso de sistemas de computador que permite que certos subsistemas de hardware acessem a memória principal do sistema independentemente da unidade central de processamento (CPU). O DMA permite que os dispositivos compartilhem e recebam dados da memória principal em um computador. Ele faz isso enquanto ainda permite que a CPU execute outras tarefas.
O que são Sistemas Operacionais de Tempo Real?
Um sistema operacional de tempo real (RTOS) é um sistema operacional (SO) para aplicações de computação em tempo real que processa dados e eventos que têm restrições de tempo criticamente definidas. Um RTOS é distinto de um sistema operacional de tempo compartilhado, como o Unix, que gerencia o compartilhamento de recursos do sistema com um agendador, buffers de dados ou priorização de tarefas fixas em um ambiente multitarefa ou multiprogramação. Os requisitos de tempo de processamento precisam ser totalmente compreendidos e limitados, em vez de apenas mantidos como um mínimo. Todo o processamento deve ocorrer dentro das restrições definidas. Os sistemas operacionais de tempo real são orientados a eventos e preemptivos, o que significa que o SO pode monitorar a prioridade relevante de tarefas concorrentes e fazer alterações na prioridade da tarefa. Os sistemas orientados a eventos alternam entre tarefas com base em suas prioridades, enquanto os sistemas de tempo compartilhado alternam a tarefa com base em interrupções de relógio.
Lista de tipos de interrupção
Existem seis classes de interrupções possíveis: * Externa * Verificação de máquina * E/S * Programa * Reiniciar * Chamada de supervisor (SVC)
## Big Data
Explique o que é exatamente Big Data
Conforme definido por Doug Laney: * Volume: Volumes extremamente grandes de dados * Velocidade: Tempo real, lote, fluxos de dados * Variedade: Várias formas de dados, estruturados, semi-estruturados e não estruturados * Veracidade ou Variabilidade: Dados inconsistentes, às vezes imprecisos e variáveis
O que é DataOps? Como está relacionado ao DevOps?
DataOps busca reduzir o tempo de ciclo de ponta a ponta da análise de dados, desde a origem das ideias até a criação literal de gráficos, gráficos e modelos que criam valor. DataOps combina desenvolvimento Ágil, DevOps e controles de processo estatístico e os aplica à análise de dados.
O que é Arquitetura de Dados?
Uma resposta de talend.com: "Arquitetura de dados é o processo de padronizar como as organizações coletam, armazenam, transformam, distribuem e usam dados. O objetivo é entregar dados relevantes para as pessoas que precisam deles, quando precisam, e ajudá-las a entendê-los."
Explique os diferentes formatos de dados
* Estruturado - dados que têm formato e comprimento definidos (por exemplo, números, palavras) * Semi-estruturado - Não se conforma a um formato específico, mas é autodescritivo (por exemplo, XML, SWIFT) * Não estruturado - não segue um formato específico (por exemplo, imagens, mensagens de teste)
O que é um Data Warehouse?
Explicação da Wikipedia sobre Data Warehouse Explicação da Amazon sobre Data Warehouse
O que é Data Lake?
Data Lake - Wikipedia
Você pode explicar a diferença entre um data lake e um data warehouse?
O que é "Versionamento de Dados"? Quais modelos de "Versionamento de Dados" existem?
O que é ETL?
#### Apache Hadoop
Explique o que é Hadoop
Apache Hadoop - Wikipedia
Explique o Hadoop YARN
Responsável por gerenciar os recursos de computação em clusters e agendar as aplicações dos usuários
Explique o Hadoop MapReduce
Um modelo de programação para processamento de dados em grande escala
Explique o Hadoop Distributed File Systems (HDFS)
* Sistema de arquivos distribuído que fornece alta largura de banda agregada em todo o cluster. * Para um usuário, parece uma estrutura de sistema de arquivos regular, mas nos bastidores é distribuído por várias máquinas em um cluster * O tamanho típico do arquivo é TB e pode escalar e suportar milhões de arquivos * É tolerante a falhas, o que significa que fornece recuperação automática de falhas * É mais adequado para executar operações em lote longas do que para análises ao vivo
O que você sabe sobre a arquitetura HDFS?
Arquitetura HDFS * Arquitetura Mestre-escravo * Namenode - mestre, Datanodes - escravos * Arquivos divididos em blocos * Blocos armazenados em datanodes * Namenode controla todos os metadados
## Ceph
Explique o que é Ceph
Ceph é um Sistema de Armazenamento Distribuído de Código Aberto projetado para fornecer excelente desempenho, confiabilidade e escalabilidade. É frequentemente usado em ambientes de computação em nuvem e Data Centers.
Verdadeiro ou Falso? Ceph favorece consistência e correção em detrimento do desempenho
Verdadeiro
Quais serviços ou tipos de armazenamento o Ceph suporta?
* Objeto (RGW) * Bloco (RBD) * Arquivo (CephFS)
O que é RADOS?
* Reliable Autonomic Distributed Object Storage (Armazenamento de Objetos Distribuído Autônomo e Confiável) * Fornece serviço de armazenamento de objetos de dados de baixo nível * Consistência Forte * Simplifica o design e a implementação de camadas superiores (bloco, arquivo, objeto)
Descreva os componentes de software do RADOS
* Monitor * Autoridade central para autenticação, posicionamento de dados, política * Ponto de coordenação para todos os outros componentes do cluster * Protege o estado crítico do cluster com Paxos * Gerenciador * Agrega métricas em tempo real (throughput, uso de disco, etc.) * Host para funções de gerenciamento conectáveis * 1 ativo, 1+ em espera por cluster * OSD (Object Storage Daemon) * Armazena dados em um HDD ou SSD * Atende a solicitações de E/S do cliente
Qual é o fluxo de trabalho para recuperar dados do Ceph?
O fluxo de trabalho é o seguinte: 1. O cliente envia uma solicitação ao cluster ceph para recuperar dados: > **O cliente pode ser qualquer um dos seguintes** >> * Dispositivo de Bloco Ceph >> * Gateway de Objeto Ceph >> * Qualquer cliente ceph de terceiros 2. O cliente recupera o mapa de cluster mais recente do Monitor Ceph 3. O cliente usa o algoritmo CRUSH para mapear o objeto para um grupo de posicionamento. O grupo de posicionamento é então atribuído a um OSD. 4. Uma vez que o grupo de posicionamento e o Daemon OSD são determinados, o cliente pode recuperar os dados do OSD apropriado
Qual é o fluxo de trabalho para escrever dados no Ceph?
O fluxo de trabalho é o seguinte: 1. O cliente envia uma solicitação ao cluster ceph para recuperar dados 2. O cliente recupera o mapa de cluster mais recente do Monitor Ceph 3. O cliente usa o algoritmo CRUSH para mapear o objeto para um grupo de posicionamento. O grupo de posicionamento é então atribuído a um Daemon Ceph OSD dinamicamente. 4. O cliente envia os dados para o OSD primário do grupo de posicionamento determinado. Se os dados forem armazenados em um pool com codificação de apagamento, o OSD primário é responsável por codificar o objeto em pedaços de dados e pedaços de codificação, e distribuí-los para os outros OSDs.
O que são "Grupos de Posicionamento"?
Descreva em detalhes o seguinte: Objetos -> Pool -> Grupos de Posicionamento -> OSDs
O que é OMAP?
O que é um servidor de metadados? Como funciona?
## Packer
O que é Packer? Para que é usado?
Em geral, o Packer automatiza a criação de imagens de máquina. Ele permite que você se concentre na configuração antes da implantação enquanto cria as imagens. Isso permite que você inicie as instâncias muito mais rápido na maioria dos casos.
O Packer segue um modelo de "configuração->implantação" ou "implantação->configuração"?
Uma configuração->implantação que tem algumas vantagens como: 1. Velocidade de Implantação - você configura uma vez antes da implantação em vez de configurar toda vez que implanta. Isso permite que você inicie instâncias/serviços muito mais rápido. 2. Infraestrutura mais imutável - com configuração->implantação não é provável ter implantações muito diferentes, pois a maior parte da configuração é feita antes da implantação. Problemas como erros de dependências são tratados/descobertos antes da implantação neste modelo.
## Lançamento
Explique o Versionamento Semântico
Esta página explica perfeitamente: ``` Dado um número de versão MAJOR.MINOR.PATCH, incremente o: versão MAJOR quando você faz alterações de API incompatíveis versão MINOR quando você adiciona funcionalidade de maneira compatível com versões anteriores versão PATCH quando você faz correções de bugs compatíveis com versões anteriores Rótulos adicionais para pré-lançamento e metadados de compilação estão disponíveis como extensões para o formato MAJOR.MINOR.PATCH. ```
## Certificados Se você está procurando uma maneira de se preparar para um determinado exame, esta é a seção para você. Aqui você encontrará uma lista de certificados, cada um referenciando um arquivo separado com perguntas focadas que o ajudarão a se preparar para o exame. Boa sorte :) #### AWS * Cloud Practitioner (Última atualização: 2020) * Solutions Architect Associate (Última atualização: 2021) * Cloud SysOps Administration Associate (Última atualização: Out 2022) #### Azure * AZ-900 (Última atualização: 2021) #### Kubernetes * Certified Kubernetes Administrator (CKA) (Última atualização: 2022) ## Projetos Adicionais de DevOps e SRE

## Créditos Obrigado a todos os nossos incríveis colaboradores que tornam fácil para todos aprenderem coisas novas :) Os créditos dos logotipos podem ser encontrados aqui ## Licença [!License: CC BY-NC-ND 3.0](https://creativecommons.org/licenses/by-nc-nd/3.0/) ================================================ FILE: README-zh_CN.md ================================================

:information_source:  此存储库包含有关各种技术主题的问题和练习,有时与 DevOps 和 SRE 相关 :bar_chart:  当前有 **2624** 个问题 :warning:  您可以使用这些来准备面试,但大多数问题和练习并不代表实际的面试。请阅读[常见问题](faq.md)了解更多详情 :page_facing_up:  不同的面试官专注于不同的事情。 有些人将重点放在你的简历上,而另一些人可能将重点放在方案问题或特定的技术问题上。 在这个仓库,我尽力覆盖各种类型的 DevOps 问题,供你练习和测试你的知识 :pencil:  你可以通过提交拉取请求来添加更多练习:) 在[此处](CONTRIBUTING.md)阅读贡献指南 ****
DevOps
DevOps
Git
Git
Network
Network
Hardware
Hardware
kubernetes
Kubernetes
programming
Software Development
Python
Python
go
Go
perl
Perl
RegEx
Regex
Cloud
Cloud
aws
AWS
azure
Azure
Google Cloud Platform
Google Cloud Platform
openstack
OpenStack
Operating System
Operating System
Linux
Linux
Virtualization
Virtualization
DNS
DNS
Bash
Shell Scripting
Databases
Databases
sql
SQL
Mongo
Mongo
Testing
Testing
Big Data
Big Data
cicd
CI/CD
Certificates
Certificates
Containers
Containers
OpenShift
OpenShift
Storage
Storage
Terraform
Terraform
puppet
Puppet
Distributed
Distributed
you
Questions you can ask
ansible
Ansible
observability
Observability
Prometheus
Prometheus
Circle CI
Circle CI
DataDog
Grafana
Grafana
Argo
Argo
HR
Soft Skills
security
Security
Design
System Design
Chaos Engineering
Chaos Engineering
Misc
Misc
Elastic
Elastic
Kafka
Kafka
## 网络
一般来说,你需要什么才能进行交流?
- 一种共同的语言(供两端理解) - 与你想要沟通的人交流的方法 - 一个连接(以便通信内容能够到达接收者)
什么是 TCP/IP?
一组协议,定义了两个或多个设备如何相互通信。 了解更多关于TCP/IP, 阅读 [这里](http://www.penguintutor.com/linux/basic-network-reference)
什么是以太网?
以太网简单地指的是当今最常见的局域网(LAN)类型。与跨越较大地理区域的广域网(WAN)相对,LAN是一个连接在小范围内的计算机网络,比如你的办公室、大学校园或者家庭。
什么是 MAC 地址?它有什么用途?
MAC地址是用于识别网络上各个设备的唯一标识号码或代码。 通过以太网发送的数据包始终来自一个 MAC 地址并发送到一个 MAC 地址。如果网络适配器接收到一个数据包,它会将该数据包的目标 MAC 地址与适配器自身的 MAC 地址进行比较。
这个 MAC 地址是在什么时候使用的?: ff:ff:ff:ff:ff:ff
当设备向广播 MAC 地址(FF:FF:FF:FF:FF:FF)发送数据包时,它会传递给本地网络上的所有站点。以太网广播用于在数据链路层通过 ARP 解析 IP 地址到 MAC 地址。
什么是 IP 地址?
互联网协议地址(IP 地址)是分配给连接到使用互联网协议进行通信的计算机网络上的每个设备的数字标签。IP地址具有两个主要功能:主机或网络接口识别和位置寻址。
解释子网掩码并举例说明
子网掩码是一个32位的数字,用于屏蔽 IP 地址并将 IP 地址分为网络地址和主机地址。子网掩码通过将网络位设置为全部"1",将主机位设置为全部"0"来生成。在给定的网络中,总可用主机地址中始终保留两个用于特定目的,并且不能分配给任何主机。这些是第一个地址,被保留作为网络地址(也称为网络 ID),以及最后一个用于网络广播的地址。 [例子](https://github.com/philemonnwanne/projects/tree/main/exercises/exe-09)
私有 IP 地址是什么?在哪些场景/系统设计中应该使用它?
私有IP地址被分配给同一网络中的主机,以便彼此通信。正如“私有”这个名字所暗示的那样,拥有私有IP地址的设备无法被来自任何外部网络的设备访问到。例如,如果我住在一个宿舍,并且我希望我的室友们加入我托管的游戏服务器,我会要求他们通过我的服务器的私有IP地址加入,因为该网络是局域网。
什么是公共 IP 地址?在哪些场景/系统设计中,应该使用它?
公共IP地址是面向公众的 IP 地址。如果你正在托管一个游戏服务器,希望你的朋友加入,你会给他们提供你的公共IP地址,以便他们的计算机能够识别和定位到你的网络和服务器,从而进行连接。在与与您连接到同一网络的朋友玩耍时,并不需要使用面向公众的IP地址,在这种情况下,您将使用私有IP地址。为了使某人能够连接到内部位置的服务器上,您需要设置端口转发来告诉路由器允许来自公共域名和网络之间的流量通信。
解释 OSI 模型。有哪几层?每层负责什么?
- 应用程序:用户端( HTTP 在此)。 - 演示:建立应用层实体之间的上下文(加密在这里)。 - 会话:建立、管理和终止连接。 - 传输:将可变长度的数据序列从源主机传输到目标主机( TCP 和 UDP 在此)。 - 网络:将数据报从一个网络传输到另一个网络( IP 在此)。 - 数据链路:提供两个直接连接的节点之间的链接(MAC在此)。 - 物理特性:数据连接的电气和物理规格(位数在此)。 您可以在 [penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference) 阅读有关OSI模型的更多信息。
对于以下每个确定其属于哪个 OSI 层: * 错误更正 * 数据包路由 * 电缆和电信号 * MAC 地址 * IP 地址 * 终止连接 * 3 次握手
* 错误纠正 - 数据链路 * 数据包路由 - 网络 * 电缆和电信号 - 物理 * MAC 地址 - 数据链路 * IP 地址 - 网络 * 终止连接 - 会话 * 3次握手 - 传输
你熟悉哪些交付计划?
单播:一对一的通信,其中有一个发送者和一个接收者。 广播:向网络中的所有人发送消息。地址 ff:ff:ff:ff:ff:ff 用于广播。 使用广播的两个常见协议是 ARP 和 DHCP。 多播:向一组订阅者发送消息。它可以是一对多或多对多的。
什么是 CSMA/CD?它在现代以太网网络中使用吗?
CSMA/CD 代表载波侦听多路访问冲突检测。 其主要目标是管理对共享介质/总线的访问,每次只有一个主机可以传输。 CSMA/CD 算法: 1. 在发送帧之前,它会检查是否有另一个主机正在传输帧。 2. 如果没有人在传输,它就开始传输帧。 3. 如果两个主机同时传输,就会发生碰撞。 4. 两个主机都停止发送帧,并向所有人发送一个“干扰信号”,通知大家发生了碰撞。 5. 他们正在等待一个随机的时间再次发送它。 6. 一旦每个主机等待了随机时间,它们会再次尝试发送帧,从而重新开始循环。
描述以下网络设备及其之间的区别: * 路由器 * 交换机 * 集线器
路由器、交换机和集线器都是用于连接局域网(LAN)中的设备的网络设备。然而,每个设备的操作方式不同,并且具有其特定的使用情况。以下是对每个设备及其之间区别的简要描述: 1. 路由器:一种网络设备,用于连接多个网络段。它在OSI模型的网络层(第3层)上运行,并使用路由协议来指导网络之间的数据传输。路由器使用IP地址来识别设备并将数据包定向到正确的目标位置。 2. 交换机:一种网络设备,用于连接局域网上的多个设备。它在OSI模型的数据链路层(第二层)工作,并使用MAC地址来识别设备并将数据包定向到正确的目标。交换机可以使同一网络上的设备更高效地相互通信,并且可以防止多个设备同时发送数据时可能发生的数据碰撞。 3. 集线器:一种网络设备,通过单根电缆连接多个设备,并用于在不分割网络的情况下连接多个设备。然而,与交换机不同的是,它在OSI模型的物理层(第1层)上运行,并且只是将数据包广播到所有连接到它的设备,无论该设备是否为预期接收者。这意味着可能会发生数据碰撞,并且网络效率可能因此受到影响。由于交换机更高效并提供更好的网络性能,所以现代网络设置通常不使用集线器。
什么是“冲突域”?
冲突域是一个网络段,在这个网络段中,设备可能会因为试图同时传输数据而相互干扰。当两个设备同时传输数据时,可能会发生碰撞,导致数据丢失或损坏。在冲突域中,所有设备共享同样的带宽,并且任何设备都有可能干扰其他设备的数据传输。
什么是“广播域”?
广播域是一个网络段,其中所有设备可以通过发送广播消息相互通信。广播消息是一条发送给网络中所有设备而不是特定设备的消息。在广播域中,所有设备都可以接收和处理广播消息,无论该消息是否针对它们。
连接到一个交换机的三台计算机。有多少个冲突域?有多少个广播域?
三个冲突域和一个广播域
路由器是如何工作的?
路由器是一种物理或虚拟设备,用于在两个或多个分组交换的计算机网络之间传递信息。路由器检查给定数据包的目标互联网协议地址(IP地址),计算它到达目的地的最佳路径,然后相应地转发它。
什么是NAT?
网络地址转换(NAT)是一个过程,其中一个或多个本地IP地址被翻译成一个或多个全局IP地址,反之亦然,以便为本地主机提供互联网访问。
什么是代理?它是如何工作的?我们为什么需要它?
代理服务器充当您和互联网之间的网关。它是一个中介服务器,将最终用户与他们浏览的网站分离开来。 如果您使用代理服务器,互联网流量将通过代理服务器传输到您请求的地址。然后,该请求再次通过相同的代理服务器返回(有一些例外情况),然后代理服务器将从网站接收到的数据转发给您。 代理服务器根据您的使用情况、需求或公司政策提供不同级别的功能、安全性和隐私保护。
TCP 是什么?它如何工作?三次握手是什么?
TCP 三次握手,又称为三向握手,在 TCP/IP 网络中用于建立服务器和客户端之间的连接的过程。 三次握手主要用于创建 TCP 套接字连接。它在以下情况下起作用: - 一个客户节点通过IP网络向同一网络或外部网络上的服务器发送SYN数据包。该数据包的目标是询问/推断服务器是否对新连接开放。 - 目标服务器必须具有可以接受和发起新连接的开放端口。当服务器从客户节点收到SYN数据包时,它会响应并返回确认收据 - ACK 数据包或 SYN/ACK 数据包。 - 客户端节点接收到来自服务器的 SYN/ACK,并用一个 ACK数据包作出响应。
什么是往返延迟或往返时间?
摘自 [维基百科](https://en.wikipedia.org/wiki/Round-trip_delay):"发送信号所需的时间加上收到信号确认所需的时间"。 附加问题:局域网的 RTT 是多少?
SSL 握手是如何进行的?
SSL 握手是在客户端和服务器之间建立安全连接的过程。 1. 客户端向服务器发送一个Client Hello消息,其中包括客户端的SSL/TLS协议版本、客户端支持的加密算法列表和一个随机值。 2. 服务器响应一个Server Hello消息,其中包括服务器的SSL/TLS协议版本、一个随机值和会话ID。 3. 服务器发送一个证书消息,其中包含了服务器的证书。 4. 服务器发送 Server Hello Done 信息,表示服务器已完成服务器 Hello 阶段的信息发送。 5. 客户发送包含客户公钥的客户密钥交换信息。 6. 客户端发送 "更改密码规格 "报文,通知服务器客户端即将发送使用新密码规格加密的报文。 7. 客户端发送一个加密的握手消息,其中包含使用服务器的公钥加密的预主密钥。 8. 服务器发送 "更改密码规格 "信息,通知客户端服务器即将发送使用新密码规格加密的信息。 9. 服务器发送加密握手信息,其中包含用客户机公钥加密的预主密钥。 10. 客户端和服务器现在可以交换应用数据。
TCP 和 UDP 有什么区别?
TCP 在客户端和服务器之间建立连接,以保证数据包的顺序,而 UDP 不在客户端和服务器之间建立连接,也不处理数据包顺序。这使得 UDP 比 TCP 更轻便,是流媒体等服务的理想选择。 [Penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference) 提供了很好的解释。
您熟悉哪些 TCP/IP 协议?
解释“默认网关”
默认网关是一个接入点或 IP 路由器,联网计算机利用它将信息发送到另一个网络或互联网上的计算机。
什么是 ARP?它是如何工作的?
ARP 是地址解析协议(Address Resolution Protocol)的缩写。当您尝试 ping 本地网络上的一个 IP 地址(如 192.168.1.1)时,您的系统必须将 IP 地址 192.168.1.1 转换为 MAC 地址。这就需要使用 ARP 来解析该地址,ARP 也因此而得名。 系统会保存一个 ARP 查找表,其中存储了哪些 IP 地址与哪些 MAC 地址相关联的信息。当试图向某个 IP 地址发送数据包时,系统会首先查询该表,看是否已经知道该 MAC 地址。如果有缓存值,则不使用 ARP。
什么是 TTL?它有助于防止什么?
- TTL(生存时间)是IP(Internet Protocol,互联网协议)数据包中的一个值,它决定了在被丢弃之前数据包可以经过多少跳或路由器。每次通过路由器转发数据包时,TTL值会减少一。当TTL值达到零时,数据包将被丢弃,并向发送方发送ICMP(Internet Control Message Protocol,互联网控制消息协议)消息以指示该数据包已过期。 - TTL 用于防止数据包在网络中无限循环,否则会造成拥塞并降低网络性能。 - 它还有助于防止数据包陷入路由环路,即数据包在同一组路由器之间不断往返而永远无法到达目的地。 - 此外,TTL 还可用于帮助检测和防止 IP 欺骗攻击,在这种攻击中,攻击者试图通过使用虚假或伪造的 IP 地址来冒充网络上的其他设备。通过限制数据包的跳数,TTL 可以帮助防止数据包被路由到不合法的目的地。
什么是 DHCP?它是如何工作的?
它代表动态主机配置协议,为主机分配 IP 地址、子网掩码和网关。它是这样工作的: * 主机在进入网络时广播一条寻找 DHCP 服务器的信息(DHCP DISCOVER)。 * DHCP 服务器会以数据包的形式发回要约信息,其中包含租用时间、子网掩码、IP 地址等信息(DHCP OFFER)。 * 根据接受的提议,客户端会发送回复广播,让所有 DHCP 服务器都知道(DHCP 请求)。 * 服务器发送确认(DHCP ACK) 更多信息 [此处](https://linuxjourney.com/lesson/dhcp-overview)
同一个网络中可以有两个 DHCP 服务器吗?它是如何工作的?
可以在同一网络上安装两个 DHCP 服务器,但不建议这样做,而且必须仔细配置,以防止冲突和配置问题。 - 在同一网络上配置两个 DHCP 服务器时,两个服务器都有可能为同一设备分配 IP 地址和其他网络配置设置,从而导致冲突和连接问题。此外,如果 DHCP 服务器配置了不同的网络设置或选项,网络上的设备可能会收到冲突或不一致的配置设置。 - 不过,在某些情况下,可能有必要在同一网络中设置两个 DHCP 服务器,例如在大型网络中,一个 DHCP 服务器可能无法处理所有请求。在这种情况下,可以将 DHCP 服务器配置为不同的 IP 地址范围或不同的子网,这样它们就不会相互干扰。
什么是 SSL 隧道?它是如何工作的?
- SSL(安全套接字层)隧道是一种技术,用于在互联网等不安全网络上的两个端点之间建立安全的加密连接。SSL 隧道是通过将流量封装在 SSL 连接中创建的,SSL 连接可提供保密性、完整性和身份验证。 下面介绍 SSL 隧道的工作原理: 1. 客户端启动与服务器的 SSL 连接,其中包括建立 SSL 会话的握手过程。 2. SSL 会话建立后,客户端和服务器会协商加密参数,如加密算法和密钥长度,然后交换数字证书,以验证彼此的身份。 3. 客户端随后通过 SSL 隧道将流量发送到服务器,服务器解密流量并将其转发到目标位置。 4. 服务器通过 SSL 隧道将流量发送回客户端,客户端对流量进行解密并将其转发给应用程序。
什么是套接字?在哪里可以看到系统中的套接字列表?
- 套接字是一种软件端点,可使进程之间通过网络进行双向通信。套接字为网络通信提供了一个标准化接口,允许应用程序在网络上发送和接收数据。查看 Linux 系统上打开的套接字列表: ***netstat -an*** - 该命令显示所有打开套接字的列表,以及它们的协议、本地地址、外来地址和状态。
什么是 IPv6?如果我们有 IPv4,为什么还要考虑使用它?
- IPv6(互联网协议版本 6)是互联网协议(IP)的最新版本,用于识别网络上的设备并与之通信。IPv6 地址是 128 位地址,用十六进制表示,如 2001:0db8:85a3:0000:0000:8a2e:0370:7334。 我们应该考虑使用 IPv6 而不是 IPv4 有几个原因: 1. 地址空间:IPv4 的地址空间有限,在世界上许多地方已经耗尽。IPv6 提供了更大的地址空间,可提供数万亿个唯一的 IP 地址。 2. 安全性:IPv6 包含对 IPsec 的内置支持,为网络流量提供端到端加密和身份验证。 3. 性能:IPv6 包括一些有助于提高网络性能的功能,例如组播路由,它允许将一个数据包同时发送到多个目的地。 4. 简化网络配置:IPv6 包含可简化网络配置的功能,例如无状态自动配置,它允许设备自动配置自己的 IPv6 地址,而无需 DHCP 服务器。 5. 更好的移动性支持:IPv6 包含可改进移动性支持的功能,如移动 IPv6,它允许设备在不同网络之间移动时保持其 IPv6 地址。
什么是 VLAN?
- VLAN(虚拟局域网)是一种逻辑网络,它将物理网络上的一组设备组合在一起,而不管它们的物理位置如何。创建 VLAN 的方法是配置网络交换机,为连接到交换机上特定端口或端口组的设备发送的帧分配特定的 VLAN ID。
什么是 MTU?
MTU 是最大传输单元(Maximum Transmission Unit)的缩写。它是指单个事务中可发送的最大 PDU(协议数据单元)的大小。
如果发送的数据包大于 MTU,会发生什么情况?
在 IPv4 协议中,路由器可以对 PDU 进行分片,然后通过事务发送所有已分片的 PDU。 使用 IPv6 协议时,它会向用户计算机发出错误信息。
真还是假?Ping 使用 UDP 是因为它不在乎连接是否可靠
错。Ping 实际上使用的是 ICMP(互联网控制报文协议),这是一种用于发送与网络通信有关的诊断信息和控制信息的网络协议。
什么是 SDN?
- SDN 是软件定义网络(Software-Defined Networking)的缩写。它是一种网络管理方法,强调网络控制的集中化,使管理员能够通过软件抽象来管理网络行为。 - 在传统网络中,路由器、交换机和防火墙等网络设备需要使用专用软件或命令行界面进行单独配置和管理。相比之下,SDN 将网络控制平面与数据平面分开,允许管理员通过集中式软件控制器管理网络行为。
什么是 ICMP?它有什么用途?
- ICMP 是 Internet Control Message Protocol 的缩写。它是 IP 网络中用于诊断和控制的协议。它是互联网协议套件的一部分,在网络层运行。 ICMP消息被用于各种目的,包括: 1. 错误报告:ICMP消息用于报告网络中发生的错误,例如无法将数据包传递到其目的地。 2. Ping:ICMP 用于发送 ping 信息,该信息用于测试主机或网络是否可连接,并测量数据包的往返时间。 3. 路径 MTU 发现:ICMP 用于发现路径的最大传输单元(MTU),即无需分片即可传输的最大数据包大小。 4. 跟踪路由跟踪路由实用程序使用 ICMP 跟踪数据包通过网络的路径。 5. 路由器发现ICMP 用于发现网络中的路由器。
什么是 NAT?它是如何工作的?
NAT 是网络地址转换的缩写。它是一种在传输信息前将多个本地专用地址映射到一个公共地址的方法。希望多个设备使用一个 IP 地址的组织和大多数家用路由器一样,都会使用 NAT。 例如,你电脑的私有 IP 可能是 192.168.1.100,但你的路由器会将流量映射到它的公共 IP(如 1.1.1.1)。互联网上的任何设备都会看到来自公共 IP(1.1.1.1)而不是私人 IP(192.168.1.100)的流量。
下列协议中使用的端口号分别是? * SSH * SMTP * HTTP * DNS * HTTPS * FTP * SFTP
* SSH - 22 * SMTP - 25 * HTTP - 80 * DNS - 53 * HTTPS - 443 * FTP - 21 * SFTP - 22
哪些因素会影响网络性能?
有几个因素会影响网络性能,包括: 1. 带宽:网络连接的可用带宽会极大地影响其性能。带宽有限的网络可能会出现数据传输速率慢、延迟高和响应速度差等问题。 2. 延迟:延迟是指数据从网络中的一个点传输到另一个点时发生的延迟。高延迟会导致网络性能缓慢,尤其是视频会议和在线游戏等实时应用。 3. 网络拥塞:当太多设备同时使用网络时,就会出现网络拥塞,导致数据传输速率缓慢和网络性能低下。 4. 数据包丢失:当数据包在传输过程中丢失时,就会出现丢包现象。这会导致网络速度变慢,整体网络性能降低。 5. 网络拓扑:网络的物理布局,包括交换机、路由器和其他网络设备的位置,都会影响网络性能。 6. 网络协议:不同的网络协议具有不同的性能特征,会影响网络性能。例如,TCP 是一种可靠的协议,可以保证数据的传输,但也会因错误检查和重传所需的开销而导致性能降低。 7. 网络安全:防火墙和加密等安全措施会影响网络性能,尤其是在需要大量处理能力或引入额外延迟的情况下。 8. 距离:网络设备之间的物理距离会影响网络性能,尤其是无线网络,信号强度和干扰会影响连接性和数据传输速率。
什么是 APIPA?
APIPA 是分配给设备的一组 IP 地址 当主 DHCP 服务器无法访问时分配给设备的 IP 地址
APIPA 使用哪个 IP 范围?
APIPA 使用的 IP 范围是169.254.0.1 - 169.254.255.254.
#### 控制平面和数据平面
"控制平面"是指什么?
控制平面是网络的一部分,它决定如何将数据包路由和转发到不同的位置。
数据平面 "指的是什么?
数据平面是网络中实际转发数据/数据包的部分。
管理平面 "指的是什么?
它指的是监测和管理功能。
创建路由表属于哪个平面(数据、控制......)?
控制平面。
解释生成树协议(STP)。
什么是链路聚合?为什么要使用?
什么是非对称路由?如何处理?
您熟悉哪些覆盖(隧道)协议?
什么是 GRE?它是如何运作的?
什么是 VXLAN?它是如何工作的?
什么是 SNAT?
解释 OSPF。
OSPF(开放式最短路径优先)是一种路由协议,可在各种类型的路由器上实施。一般来说,大多数现代路由器都支持 OSPF,包括思科、瞻博网络和华为等供应商的路由器。该协议设计用于基于 IP 的网络,包括 IPv4 和 IPv6。此外,它采用分层网络设计,将路由器分组为区域,每个区域都有自己的拓扑图和路由表。这种设计有助于减少路由器之间需要交换的路由信息量,提高网络的可扩展性。 OSPF 4 路由器类型有 * Internal Router * Area Border Routers * Autonomous Systems Boundary Routers * Backbone Routers 了解有关 OSPF 路由器类型的更多信息: https://www.educba.com/ospf-router-types
什么是延迟?
延迟是指信息从信息源到达目的地所需的时间。
什么是带宽?
带宽是通信信道的容量,用于衡量后者在特定时间段内可处理的数据量。带宽越大,意味着处理的流量越多,数据传输量也就越大。
什么是吞吐量?
吞吐量是指在一定时间内通过任何传输通道传输的实际数据量。
在进行搜索查询时,延迟和吞吐量哪个更重要?如何确保我们对全球基础设施进行管理?
延迟。要获得良好的延迟,搜索查询应转发到最近的数据中心。
上传视频时,延迟和吞吐量哪个更重要?如何确保这一点?
吞吐量。为获得良好的吞吐量,上传数据流应被路由到未充分利用的链路。
转发请求时还需要考虑哪些因素(除了延迟和吞吐量)?
* 保持缓存更新(这意味着请求可能不会被转发到最近的数据中心)
解释 Spine & Leaf
什么是网络拥塞?什么原因会导致网络拥塞?
当网络上需要传输的数据过多,而网络容量不足以满足需求时,就会出现网络拥塞。
这会导致延迟和数据包丢失增加。原因可能是多方面的,如网络使用率高、文件传输量大、恶意软件、硬件问题或网络设计问题。
为防止网络拥塞,必须监控网络使用情况,并实施策略来限制或管理需求。
关于 UDP 数据包格式,您能告诉我什么?TCP 数据包格式如何?有何不同?
什么是指数后退算法?在哪里使用?
使用汉明码,以下数据字 100111010001101 的码字是什么?
00110011110100011101
举例说明应用层中的协议
* 超文本传输协议(HTTP)--用于互联网上的网页 * 简单邮件传输协议(SMTP)--用于电子邮件传输 * 电信网络(TELNET)--终端模拟,允许客户端访问 telnet 服务器 * 文件传输协议(FTP)--便于在任何两台机器之间传输文件 * 域名系统 (DNS) - 域名转换 * 动态主机配置协议(DHCP)--为主机分配 IP 地址、子网掩码和网关 * 简单网络管理协议(SNMP)--收集网络设备数据
举例说明网络层中的协议
* 互联网协议 (IP) - 协助将数据包从一台机器路由到另一台机器 * 互联网控制消息协议(ICMP)--让人知道发生了什么,如错误信息和调试信息
什么是 HSTS?
HTTP 严格传输安全(HTTP Strict Transport Security)是一种网络服务器指令,它通过在开始时发送并返回给浏览器的响应标头,告知用户代理和网络浏览器如何处理其连接。这将强制通过 HTTPS 加密连接,忽略任何脚本通过 HTTP 加载该域中任何资源的调用。 阅读更多 [此处](https://www.globalsign.com/en/blog/what-is-hsts-and-how-do-i-use-it#:~:text=HTTP%20Strict%20Transport%20Security%20(HSTS,and%20back%20to%20the%20browser.)
#### 网络 - 其他
什么是互联网?它和万维网一样吗?
互联网是一个由网络组成的网络,在全球范围内传输大量数据。
万维网是一个运行在数百万服务器上的应用程序,它位于互联网之上,可通过所谓的网络浏览器访问
什么是ISP?
ISP(互联网服务提供商)是当地的互联网公司。
## DevOps #### 初级
什么是 DevOps? DevOps 帮助我们完成什么?
DevOps 的反模式是什么?
什么是持续集成?
开发人员经常将代码集成到共享仓库中的一种开发实践。 它的范围可以从每天或每周进行几次更改,到大规模在一个小时内进行几次更改。 验证每段代码(更改/补丁),以使更改可以安全地合并。 如今,使用自动构建来确保代码可以集成的测试更改是一种常见的做法。 它可以是一个运行在不同级别(单元,功能等)的多个测试的构建,也可以是所有或某些必须通过以将更改合并到存储库中的多个单独的构建。
什么是持续部署?
什么是持续交付?
你认为CI / CD的最佳做法是什么?
你将用于以下哪些系统和/或工具?: * CI/CD * 基础架构 * 配置管理 * 监控 & 报警 * 日志 * 代码审查 * 代码覆盖率 * 测试集
* CI/CD - Jenkins, Circle CI, Travis * 基础架构 - Terraform, CloudFormation * 配置管理 - Ansible, Puppet, Chef * 监控 & 报警 - Prometheus, Nagios * 日志 - Logstash, Graylog, Fluentd * 代码审查 - Gerrit, Review Board * 代码覆盖率 - Cobertura, Clover, JaCoCo * 测试集 - Robot, Serenity, Gauge
你在选择工具/技术时是怎么考虑的?
你可以使用以下一项或全部:    * 成熟与尖端    * 社区规模    * 体系结构方面-代理与无代理,主控与无主控等
解释可变基础架构与不变基础架构
在可变的基础架构原则中,更改将应用到现有基础架构之上并随着时间的推移而变化 基础架构建立了变化的历史。 Ansible,Puppet和Chef这些工具 遵循可变的基础架构原则。 在不变的基础架构原则中,每项更改实际上都是新的基础架构。 所以改变 到服务器将导致新服务器而不是更新服务器。 Terraform是 遵循不变的基础架构原则的一个例子。
你熟悉什么方式来交付软件?
* 存档 - 将你所有的应用文件收集到一个存档中(例如tar),并将其交付给用户。   * 打包 - 取决于操作系统,你可以使用OS软件包格式(例如,在RHEL / Fefodra中为RPM)来交付软件,并使用标准打包程序命令来安装,卸载和更新它   * 映像 - VM或容器映像,其中包已包含在其中,以便成功运行。
什么是缓存? 缓存是怎么工作的? 为什么缓存很重要?
解释一下无状态和有状态
什么是HTTP及其工作方式?
描述一下设置某些类型的Web服务器的工作流程 (Apache, IIS, Tomact, ...)
解释一下监控. 它是什么? 为什么监控是重要的?
你熟悉那些监控方法?
#### 高级
告诉我你是如何执行CI / CD资源的计划容量 (如服务器, 存储, 等等.)
你将如何为依赖于其他多个应用程序的应用程序构建/实现CD?
你如何衡量CI / CD的质量? 有那些你正在使用的指标吗?
什么是配置漂移? 它引起什么问题?
当配置和软件完全相同的服务器环境中的某个服务器上发生配置漂移 或服务器正在应用其他服务器无法获得的更新或配置,并且随着时间的推移,这些服务器将变为 略有不同。 这种情形可能会导致难以识别和重现的错误。
怎样处理配置漂移?
你是否有跨项目变更测试的经验? (又名交叉依赖)
注意:交叉依赖是指你对单独的项目进行了两个或多个更改,并且你希望在相互构建中对其进行测试,而不是分别测试每个更改。
在哪种情况下,你希望使用SQL?
* 同类数据,预计不会发生变化   * ACID合规性很重要
## Jenkins #### 初级
什么是 Jenkins? 你用它来做什么?
相比其他的竞争者 jenkins 有什么优势? 你能把jenkins 和下面的系统做一个比较吗?: * Travis * Bamboo * Teamcity * CircleCI
解释以下: * Job * Build * Plugin * Slave * Executor
你在 Jenkins 用过什么插件?
解释一下 CI/CD 你在 Jenkins 是怎么实现他们的
有什么类型的工作? 你使用了哪些类型,为什么?
你如何向用户报告构建结果? 你熟悉什么那些方式?
每次有更改提交,你都需要运行单元测试。 详细描述管道的环境以及每个阶段将执行的操作
怎样保护 Jenkins?
你能描述一些 Jenkins 最佳实践吗?
#### 高级
如何为一个特定的构建获取多个从属?
你的组织中有四个团队。 如何优先考虑每个团队的建设? 例如,x团队的工作将始终在y团队之前运行
你有部署 Jenkins 插件的经验吗? 你能描述一下吗?
如果你要管理许多工作,你可能使用Jenkins UI。 你如何每周/每月管理数百个作业的创建和删除?
Jenkins 有那些限制?
* 测试交叉依赖关系(来自多个项目的变更)   * 从任何阶段开始构建(尽管cloudbees实现了称为检查点的东西)
你是如何实施从某个阶段而不是从最开始构建的选项?
你曾经写过 Jenkins 脚本吗? 如果有,有哪些? 分别是怎么样工作的?
## Cloud #### 初级
云计算的优势是什么? 至少列出3个优势
他们分别是那种类型的云计算?
IAAS PAAS SAAS
解释一下以下云计算部署: * Public * Hybrid * Private
## AWS #### 初级 ##### 全局基础设施
解释以下 * 可用区 * 区域 * 边缘位置
AWS区域是遍布全球不同地理位置的数据中心,每个区域彼此完全独立。 在每个区域内,有多个隔离的位置,称为可用区。 多个可用区可确保其中之一发生故障时具有高可用性。 边缘位置基本上是内容传递网络,它缓存数据并确保较低的延迟和更快地传递给任何位置的用户。 他们位于世界主要城市。
##### S3
解释一下什么是S3,以及它用来干嘛
S3代表3 S(Simple Storage Service)。 S3是一种对象存储服务,它是快速,可伸缩和持久的。 S3使客户能够上传,下载或存储最大5 TB的文件或对象。 同时每个文件的最大大小为5 GB(如果大小超过5 GB,则分段上传)。
什么是存储桶?
S3存储桶是一种资源,类似于文件系统中的文件夹,并且允许存储由数据及其元数据组成的对象。
对还是错? 存储桶必须全局唯一
True
S3 中 包含哪些对象 ? * 另一种问法: 在对象上下文中解释键,值,版本ID和元数据
解释一下数据一致性
你可以在s3上托管动态网站吗? 静态网站呢?
你在S3上下文中采取了哪些安全措施?
##### CloudFront
解释一下什么是CloudFront及其用途
解释以下 * 域 * 边缘位置 * 分布
CDN用户可以使用哪些交付方式?
对还是错? 在TTL的生命周期内缓存对象
##### EC2
你创建了哪种类型的实例?
如何为给定的EC2实例增加RAM?
停止实例,使其实例类型与所需的RAM匹配,然后启动实例。
什么是 AMI?
EC2实例有多少个存储选项?
EC2实例停止或终止时会发生什么?
什么是安全组?
如何将实例迁移到另一个可用性区域?
什么是安全组?
什么是竞价型实例?
## 网络 #### 初级
什么是以太网?
什么是一个 MAC 地址? 它用来干嘛?
什么时候这个 MAC 地址会被用来使用?: ff:ff:ff:ff:ff:ff
什么是一个 IP 地址? 什么是子网?
解释一下 OSI 模型. 有那些层? 每层负责什么?
应用层:用户端(HTTP在这一层) 表示层:在应用程序层实体之间建立上下文(加密在这一层) 会话层:建立,管理和终止连接 传输层:将可变长度的数据序列从源传输到目标主机(TCP和UDP在这一层) 网络层:将数据报从一个网络传输到另一个网络(IP 层在这里) 数据链接层:提供两个直接连接的节点之间的链接(MAC在这一层) 物理层:数据连接的电气和物理规格(比特在这一一层)
你熟悉哪些传送方案?
单位广播:一对一通信,其中有一个发送方和一个接收方。 广播:向网络中的所有人发送消息。 地址ff:ff:ff:ff:ff:ff:ff用于广播。             使用广播的两个常见协议是ARP和DHCP。 组播:向一组订户发送消息。 它可以是一对多或多对多。
什么是 CSMA/CD? 在现代以太网中有使用吗?
CSMA / CD代表载波侦听多路访问/冲突检测。 它的主要重点是管理对共享媒体/总线的访问,在该共享媒体/总线上,在给定的时间点只能传输一个主机。 CSMA / CD算法: 1. 在发送帧之前,它会检查其他主机是否已经在发送帧。 2. 如果没有人发送,它将开始发送帧。 3. 如果两个主机同时传输,则发生冲突。 4. 双方主机均停止发送帧,并向每个人发送“干扰信号”,通知每个人发生冲突 5. 他们正在等待随机时间,然后再次发送 6. 一旦每个主机等待一段随机时间,他们就会尝试再次发送帧
描述以下网络设备及其之间的区别: * 路由器 * 交换机 * 集线器
什么是 NAT?
TCP 和 UDP 两者之间有那些区别?
TCP 是怎样工作的? 什么是 3 次握手?
什么是 ARP? 它是怎么工作的?
什么是 TTL?
什么是DHCP? 它是怎么工作的?
什么是SSL 隧道? 它是怎么工作的?
什么是套接字? 在哪里可以看到系统中的套接字列表?
什么是IPv6? 如果我们拥有IPv4,为什么要考虑使用它?
什么是VLAN?
什么是MTU?
什么是SDN?
什么是ICMP? 它有什么用途?
什么是NAT? 它是怎么工作的?
#### 高级
解释一下生成树协议 (STP)
什么是链路聚合? 为什么使用它?
什么是非对称路由? 怎样处理它?
你熟悉哪些叠加(隧道)协议?
什么是GRE? 它是怎么工作的?
什么是VXLAN? 它是怎么工作的?
什么是SNAT?
解释一下 OSPF
解释一下 Spine & Leaf
使用海明码, 100111010001101 会编码成什么码?
00110011110100011101
## Linux #### 初级
你有那些 Linux 经验? 当你可以在多个操作系统上设置应用程序时,你希望在哪个操作系统上进行设置以及为什么?
解释以下每个命令的作用,并举例说明如何使用它 * ls * rm * rmdir (你能使用 rm完成同样的结果吗?) * grep * wc * curl * touch * man * nslookup or dig * df
运行命令 df 你会得到 "找不到命令". 可能出现什么问题以及如何修复它?
如何确保服务将在你选择的操作系统上启动?
你如何定期安排任务?
你能使用命令 cronat. 对于cron,使用以下格式安排任务: 任务存储在cron文件中。
你过去是否安排了任务? 什么样的任务?
通常,你将安排批处理作业。
##### 权限
怎样改变一个文件的权限?
使用 `chmod` 命令.
下面的权限意味着什么?: * 777 * 644 * 750
777 - 所有人有读和写和可执行权限(意味着你很懒) 644 - 拥有者有读和写的权限、其他人只有读权限 750 - 拥有者有所有权限, 组成员可以读和执行权限、其他人没有权限
解释一下什么是setgid, setuid 和 sticky bit
如何在不向其提供登录系统功能的情况下将新用户添加到系统?
* adduser user_name --shell=/bin/false --no-create-home
在使用systemd的系统上,如何显示日志? * journalctl
##### 调试
你正在使用什么进行故障排除和调试 网络 问题?
dstat -t 非常适合辨别网络和磁盘问题。 netstat -tnlaup 可用于查看哪些进程在哪些端口上运行。 lsof -i -P 可以用于与netstat相同的目的。 ngrep -d any metafilter 用于将正则表达式与数据包的载荷相匹配。 tcpdump 用于捕获数据包 wireshark 与tcpdump相同的概念,但带有GUI(可选)。
你正在使用什么进行故障排除和调试 磁盘 & 文件系统 问题?
dstat -t 非常适合辨别网络和磁盘问题。 opensnoop 可以用来查看正在系统上打开哪些文件(实时)。
你正在使用什么进行故障排除和调试 进程 问题?
strace 非常适合了解你的程序的功能。 它打印你的程序执行的每个系统调用。
你正在使用什么来调试CPU相关问题?
top 显示每个进程消耗多少CPU占比 perf 是采样分析器的理想选择,通常来说,找出哪些CPU周期被“浪费”了 flamegraphs 非常适合CPU消耗可视化(http://www.brendangregg.com/flamegraphs.html)
你收到一个电话,说“我的系统运行缓慢” - 你将如何处理?
1. 使用top检查是否有任何资源消耗你的CPU或RAM。 2. 运行dstat -t来检查它是否与磁盘或网络有关。 3. 使用iostat检查 I/O 统计信息
什么是Linux内核模块以及如何加载新模块?
什么是KVM?
SSH和SSL之间的区别是什么?
SSH端口转发是什么?
解释重定向
什么是通配符? 你能举一个使用它们的例子吗?
我们在以下每个命令中使用grep做什么? * grep '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' some_file * grep -E "error|failure" some_file * grep '[0-9]$' some_file
1. 一个 IP 地址 2. 单词 "error" 或 "failure" 3. 以数字结尾的行
告诉我你了解所有有关Linux启动过程的知识
什么是退出码? 你熟悉那些退出码?
退出码(或返回码)表示子进程返回其父进程的码。 0是退出码,表示成功,而大于1的码表示错误。 每个数字都有不同的含义,具体取决于应用程序的开发方式。 我认为这是一篇可以了解更多的好博客:https://shapeshed.com/unix-exit-codes
软链接和硬链接之间的区别是什么?
硬链接是使用相同inode的相同文件。 软链接是使用不同inode的另一个文件的快捷方式。 可以在不同的文件系统之间创建软链接,而硬链接只能在同一文件系统内创建。
什么是交换分区? 它用来做什么的?
你试图创建一个新文件,但显示“文件系统已满”。 你使用df检查是否有可用空间,你看到还有20%的空间。 可能是什么问题?
你对LVM有什么了解?
解释以下关于LVM: * PV * VG * LV
RAID用于什么用途? 你能否解释RAID 0、1、5和10之间的区别?
什么是懒卸载?
修复以下命令: * sed "s/1/2/g' /tmp/myFile * find . -iname \*.yaml -exec sed -i "s/1/2/g" {} ;
解释以下每个路径中存储的内容以及是否有一些独特之处
* /tmp * /var/log * /bin * /proc * /usr/local
你能在 /etc/services 找到什么
什么是 chroot?
##### 进程
如何在后台运行进程以及为什么要优先运行?
你可以通过在命令末尾指定&来实现。至于为什么,因为一些命令/过程会占用大量的时间来完成执行或永远运行
你如何查找特定进程占用的内存量?
运行“ kill”时使用什么信号 '?
默认信号为SIGTERM(15)。 该信号可以优雅地终止进程,这意味着它可以保存当前状态配置。
你熟悉哪些信号?
SIGTERM - 终止进程的默认信号 SIGHUP - 常用用法是重新加载配置 SIGKILL - 不能捕获或忽略的信号 运行 `kill -l` 查看所有可用的信号
什么是 trap?
当你按下Ctrl + C会发生什么?
什么是守护程序?
Linux中进程的可能状态是什么?
Running(运行态) Waiting (等待态)) Stopped(暂停态) Terminated(终止态) Zombie(假死态)
什么是僵尸进程? 你是如何避免的?
什么是初始进程?
如何更改进程的优先级? 你为什么想这么做?
你能解释一下网络进程/连接如何建立以及如何终止?
什么是系统调用? 你熟悉哪些系统调用?
strace 做什么的?
查找所有以“ .yml”结尾的文件,并替换每个文件中的2分之一的数字
ind /some_dir -iname \*.yml -print0 | xargs -0 -r sed -i "s/1/2/g"
如何查看系统有多少可用内存? 如何检查每个进程的内存消耗?
你可以使用命令topfree
你如何将一个50行的文件拆分为两个25行的文件?
你可以使用 split 命令就像这样split -l 25 some_file
什么是文件描述符? 你熟悉那些文件描述符?
Kerberos 文件描述符,也称为文件处理程序,是一个唯一的编号,用于标识操作系统中的打开文件。 在 Linux (和 Unix) 前三个描述符是: * 0 - 输入的默认数据流 * 1 - 输出的默认数据流 * 2 - 与错误相关的输出的默认数据流 这有一篇好的文章关于这个主题的: https://www.computerhope.com/jargon/f/file-descriptor.htm
什么是 inode?
Linux中的每个文件(和目录)都有一个索引节点,即与文件相关的存储元数据信息的数据结构 ,例如文件的大小,所有者,权限等。
如何列出活动的网络连接?
什么是NTP? 它是用来干什么的?
什么是SELiunx?
什么是Kerberos?
什么是nftables?
firewalld守护程序负责什么?
##### Network
什么是网络名称空间? 它用来干什么的?
你如何将Linux服务器变成路由器?
什么是路由表? 你是怎样查看它的?
什么是数据包嗅探器? 你过去曾经使用过吗? 如果是,你使用了哪些数据包嗅探器以及用于什么目的?
##### DNS
文件 /etc/resolv.conf 用来做什么的? 它包含那些内容?
什么是 "A record"?
什么是 PTR 记录?
A记录将域名指向IP地址,而PTR记录则相反,并将IP地址解析为域名。
什么是 MX 记录?
DNS是使用TCP还是UDP?
##### Packaging
你有打包经验吗? 你能解释一下它是怎么工作的
RPM: 解释特定格式(应包括什么内容)
你如何列出包内容?
#### 高级
当你执行 ls发生了什么? 提供一个详细的答案
你能描述流程的创建方式吗?
以下块做什么?: ``` open("/my/file") = 5 read(5, "file content") ```
系统调用正在读 /my/file文件 以及 5 是文件描述符数字.
进程和线程的区别是什么?
##### Network
当你运行 ip a 你看到一个设备叫做 'lo'. 它是什么以及为什么我们需要它?
traceroute 命令做什么的? 它是怎么工作的?
什么是网络绑定? 你熟悉什么类型?
如何链接两个单独的网络名称空间,以便你可以从另一个命名空间ping一个命名空间上的接口?
什么是cgroup? 在什么情况下你会使用它们?
如何创建一定大小的文件?
这有一些方式去做: * dd if=/dev/urandom of=new_file.txt bs=2MB count=1 * truncate -s 2M new_file.txt * fallocate -l 2097152 new_file.txt
以下系统调用之间有什么区别?: exec(), fork(), vfork() and clone()?
解释流程描述符和任务结构
线程和进程之间有什么区别?
解释内核线程
使用套接字系统调用时会发生什么?
这有一篇好的文章关于这个主题的: https://ops.tips/blog/how-linux-creates-sockets
## Ansible #### 初级
在Ansible中描述以下每个组件,包括它们之间的关系: * Task * Module * Play * Playbook * Role
任务 – 调用特定的Ansible模块 模块 – Ansible在你自己的主机或远程主机上执行的实际代码单元。 模块按类别(数据库,文件,网络等)编制索引,也称为任务插件。 Play – 在给定主机上执行的一个或多个任务 Playbook – 一个或多个Play。 每个Play可以在相同或不同的主机上执行 角色 – Ansible角色使你可以基于某些功能/服务对资源进行分组,以便可以轻松地重用它们。 在角色中,你具有变量,默认值,文件,模板,处理程序,任务和元数据的目录。 然后,你只需在剧本中指定角色即可使用该角色。
你熟悉哪些Ansible最佳做法? 至少列出 3 条
什么是清单文件以及如何定义一个?
清单文件定义了在其上执行Ansible任务的主机和/或主机组。 一个清单文件的例子 192.168.1.2 192.168.1.3 192.168.1.4 [web_servers] 190.40.2.20 190.40.2.21 190.40.2.22
什么是动态清单文件? 什么时候使用?

动态清单文件可跟踪来自一个或多个来源(例如云提供商和CMDB系统)的主机。 应该使用当使用外部源时,尤其是在环境中的主机正在自动启动和关闭,而无需跟踪这些源中的所有更改。
你只想在特定的次要操作系统上运行Ansible Play,你将如何实现?
写任务创建目录 ‘/tmp/new_directory’
``` - name: Create a new directory file: path: "/tmp/new_directory" state: directory ```
接下来的Play会有什么结果?
``` --- - name: Print information about my host hosts: localhost gather_facts: 'no' tasks: - name: Print hostname debug: msg: "It's me, {{ ansible_hostname }}" ``` 提供完成的代码后,请始终进行彻底检查。 如果你的回答是“这将失败”,那么你是对的。 我们正在使用一个事实(ansible_hostname), 这是我们正在运行的主机上收集到的信息。 但是在这种情况下,我们禁用了事实收集(gather_facts:no),因此该变量将是未定义的,这将导致失败。
如果系统上存在文件 "/tmp/mario",则编写 playbook 以在所有主机上安装 "zlib" 和 "vim" .
``` --- - hosts: all vars: mario_file: /tmp/mario package_list: - 'zlib' - 'vim' tasks: - name: Check for mario file stat: path: "{{ mario_file }}" register: mario_f - name: Install zlib and vim if mario file exists become: "yes" package: name: "{{ item }}" state: present with_items: "{{ package_list }}" when: mario_f.stat.exists ```
编写一个 playbook ,将文件 "/tmp/system_info" 部署到除控制器组之外的所有主机上,并具有以下内容:
``` 我是 我的操作系统是 ``` 替换 以及正在运行的特定主机的实际数据 The playbook 部署system_info文件 ``` --- - name: Deploy /tmp/system_info file hosts: all:!controllers tasks: - name: Deploy /tmp/system_info template: src: system_info.j2 dest: /tmp/system_info ``` The content of the system_info.j2 template ``` # {{ ansible_managed }} I'm {{ ansible_hostname }} and my operating system is {{ ansible_distribution }} ```
变量 "whoami" 在以下位置定义: * 角色默认设置 -> whoami: mario   * 额外的变量(使用 -e 传递给Ansible CLI的变量)-> whoami: toad   * 托管事实 -> whoami: luigi   * 广告资源变量(与哪种类型无关)-> whoami: browser 根据可变优先级,将使用哪个?
正确的答案是 ‘toad’。 变量优先级是关于变量在不同位置设置时如何相互覆盖的。 如果你到目前为止还没有体验过,我相信你会在某个时候确定的,这使它成为一个有用的话题。 在我们的问题上下文中,顺序将是额外的var(始终覆盖任何其他变量)-> 主机事实 -> 库存变量 -> 角色默认值(最弱)。 完整的列表可以在上面的链接中找到。 另外,请注意Ansible 1.x和2.x之间存在显着差异。
对于以下每个语句,确定对还是错: * 模块是任务的集合   * 最好使用shell或命令而不是特定的模块   * 主机事实会覆盖 play 变量   * 角色可能包括以下内容:var,meta 和 handler   * 通过从外部来源提取信息来生成动态清单   * 最佳做法是使用2个空格而不是4个缩进   * 用来触发处理程序的“通知”   * "hosts:all:!controllers"表示 "仅在控制器组主机上运行
什么是ansible-pull? 与ansible-playbook相比有何不同?
#### 高级
什么是过滤器? 你有写过滤器的经验吗?
编写过滤器来转化字符串大写
def cap(self, string): return string.capitalize()
你如何测试基于Ansible的项目?
什么是回调插件? 使用回调插件可以实现什么?
## Terraform #### 初级
你能解释一下什么是Terraform? 它是怎么工作的?
读 [这里](https://www.terraform.io/intro/index.html#what-is-terraform-)
什么使基础架构代码受益?
- 供应,修改和删除基础架构的全自动过程 - 基础结构的版本控制,可让你快速回滚到以前的版本 - 通过自动化测试和代码审查来验证基础架构的质量和稳定性 - 减少基础架构任务的重复性
为什么选择Terraform,而不选择其他技术? (例如,Ansible,Puppet,CloufFormation)
常见的错误答案是说 Ansible 和 Puppet 是配置管理工具而 Terraform 是置备工具。 尽管从技术上讲是正确的,但这并不意味着 Ansible 和 Puppet 不能 用于配置基础结构。 另外,这根本没有解释为什么应该在 CloudFormation上 使用 Terraform。 Terraform与其他工具相比的优势: * 它遵循不变的基础架构方法,该方法具有避免配置随时间变化的优势   * Ansible和Puppet具有更多的过程性(你提到了每个步骤要执行的操作),而Terraform是声明性的,因为你描述的是总体所需的状态,而不是每个资源或任务的状态。 你可以举一个在每个工具中从1台服务器转到2台服务器的示例。 在terrform中,你指定2,在Ansible和puppet中,你仅需配置1个其他服务器,因此你需要明确确保仅配置另一台服务器。
解释什么是"Terraform configuration"
解释以下每个: * Provider * Resource * Provisioner
terraform.tfstate 文件用来做什么?
它跟踪创建的资源的ID,以便Terraform知道它正在管理什么。
解释以下命令的作用: * terraform init * terraform plan * terraform validate * terraform apply
terraform init 扫描你的代码以查明你正在使用哪些提供程序并下载它们。 terraform plan 可以让你在实际执行操作之前先查看terraform即将执行的操作。 terraform apply 将提供指定的.tf文件资源。
如何记下一个由外部源或者通过 terraform apply改变的变量?
你用这种方式: variable “my_var” {}
举例说明几种Terraform最佳实践
解释一下隐式和显式依赖项在Terraform中如何工作
什么是local-exec and remote-exec in the context of provisioners?
什么是"tainted 资源"?
这是成功创建的资源,但在配置期间失败。 Terraform将失败,并将该资源标记为“tainted”。
terraform taint 做了什么?
Terraform支持哪些类型的变量?
Strimg Integer Map List
什么是输出变量以及 terraform output 做了什么?
解释 Modules
什么是 Terraform Registry?
#### 高级
解释 "Remote State". 什么时候使用它以及如何使用它?
解释 "State Locking"
## Docker #### 初级
什么是Docker? 你用它做什么?
容器与VM有何不同?
容器和虚拟机之间的主要区别是容器使你可以虚拟化 操作系统上有多个工作负载,而对于VM,则将硬件虚拟化为 在多台计算机上运行各自的操作系统。
在哪种情况下,你将使用容器,而在哪种情况下,则更喜欢使用虚拟机?
在以下情况下,你应该选择虚拟机:    * 你需要运行一个需要操作系统所有资源和功能的应用程序    * 你需要完全隔离和安全 在以下情况下,你应该选择容器:    * 你需要快速启动的轻量级解决方案    * 运行单个应用程序的多个版本或实例
解释一下 Docker 架构
详细描述一下当运行`docker run hello-world`时背后发生了什么?
Docker CLI 将你的请求传递给Docker守护程序。 Docker 守护程序从 Docker Hub 下载映像 Docker 守护程序使用下载的映像创建一个新容器 Docker 守护程序将输出从容器重定向到 Docker CLI,后者将其重定向到标准输出
你怎样运行容器?
你熟悉那些与容器相关的最佳实践?
`docker commit` 干什么的? 什么时候需要使用它?
你如何将数据从一个容器转移到另一个容器?
容器存在时容器的数据会发生什么?
解释以下每个命令的作用 * docker run * docker rm * docker ps * docker build * docker commit
如何删除未运行的旧容器?
##### Dockerfile
什么是 Dockerfile
Dockerfile中 ADD 和 COPY 之间的区别是什么?
Dockerfile中 CMD 和 RUN 之间的区别是什么?
解释一下什么是 Docker compose 以及它用来做什么
Docker compose,Docker swarm 和 Kubernetes 有什么区别?
解释 Docker interlock
Docker Hub 和 Docker Cloud 之间的区别是什么?
Docker Hub是一个本地 Docker 注册表服务,可让你运行 pull 和 push 命令以从 Docker Hub 安装和部署 Docker映像。 Docker Cloud构建在Docker Hub之上,因此Docker Cloud提供了 与Docker Hub相比,你拥有更多的可选/功能。 一个例子是 群管理,这意味着你可以在Docker Cloud中创建新的群。
存储 Docker 镜像的位置在哪里?
解释一下镜像层
#### 高级
你如何在Docker中管理持久性存储?
如何从容器内部连接到容器运行所在的主机的本地主机?
如何将文件从Docker容器复制到主机,反之亦然?
## Kubernetes #### 初级
什么是Kubernetes?
为什么Docker还不够? 为什么我们需要Kubernetes?
描述一下 Kuberenets 的架构
你是怎样监控你的 Kuberenets?
什么是kubectl? 你如何使用它?
什么是kubconfig? 你用它来做什么?
##### Users
你如何创建用户? 用户信息的存储位置?
你知道如何不使用 adduser/useradd 命令创建新用户吗?
## Coding #### 初级
你更喜欢将哪种编程语言用于与DevOps相关的任务? 为什么要专门这个?
什么是面向对象编程? 它为什么如此重要?
解释一下递归
解释一下什么是设计模式,并详细描述其中的三个
解释 big O 符号
##### Strings
用你想要的任何语言,编写一个函数来确定给定的字符串是否是回文串
#### 高级
给定3种设计模式。 你知道如何以你选择的任何语言实现(提供示例)这些设计模式?
## Python #### 初级
Python编程语言的一些特点是什么?
``` 1. 这是一种由 Guido Van Rosum 于1991年创建的高级通用编程语言。 2. 语言被解释为CPython(用C语言编写)最常用/维护的实现。 3. 它是强类型的。 类型系统是鸭子类型和渐进式的。 4. Python注重可读性,并使用空格/缩进代替括号{} 5. python 包管理器称为PIP“ pip install packages”,具有超过200.000可用的软件包。 6. Python 附带安装了pip和一个大的标准库,为程序员提供了许多预置的解决方案。 7. 在python中,“一切”都是一个对象。 还有许多其他特性,但这是每个python程序员都应该知道的主要特性。 ```
Python支持哪些数据类型,哪些是可变的? 如何显示某个数据类型是可变的?
可变数据类型是: List Dictionary Set 不可变数据类型是: Numbers (int, float, ...) String Bool Tuple Frozenset 通常,你可以使用函数hash()来检查对象的可变性,如果它是可哈希的,则是不可变的,尽管由于用户定义的对象可能是可变的且可哈希的,所以它并不总是按预期工作
什么是PEP8? 举例说明3种风格指南
PEP8是Python的编码约定和样式指南的列表 5 种样式指南: 1. 将所有行限制为最多79个字符。 2. 用两个空行包围顶级函数和类定义。 3. 制作一个元素的元组时使用逗号 4. 使用空格(而不是制表符)进行缩进 5. 每个缩进级别使用4个空格
解释一下继承以及如何在Python中使用它
``` 根据定义,继承是一种机制,其中一个对象充当另一个对象的基础,并保留其所有对象属性。 因此,如果B类继承自A类,那么A类的每个特征也将在B类中提供。A类将是“基类”,B类将是“派生类”。 当你有几个共享相同功能的类时,这很方便。 基本语法: class Base: pass class Derived(Base): pass A more forged example: class Animal: def __init__(self): print("and I'm alive!") def eat(self, food): print("ñom ñom ñom", food) class Human(Animal): def __init__(self, name): print('My name is ', name) super().__init__() def write_poem(self): print('Foo bar bar foo foo bar!') class Dog(Animal): def __init__(self, name): print('My name is', name) super().__init__() def bark(self): print('woof woof') michael = Human('Michael') michael.eat('Spam') michael.write_poem() bruno = Dog('Bruno') bruno.eat('bone') bruno.bark() >>> My name is Michael >>> and I'm alive! >>> ñom ñom ñom Spam >>> Foo bar bar foo foo bar! >>> My name is Bruno >>> and I'm alive! >>> ñom ñom ñom bone >>> woof woof 调用super()会调用Base方法,因此,调用super().__init__() 就是调用 Animal__init__。 有一个称为 MetaClasses 的更高级的python功能,可帮助程序员直接控制类的创建。 ```
什么是一个错误? 什么是一个异常? 你熟悉哪些异常类型?
``` # 请注意,你通常不需要了解编译过程,而只需知道一切都来自哪里 # 并给出完整的答案表明你真正知道你在说什么。 通常,每个编译过程都有两个步骤。 - 分析 - 产生代码. Analysis can be broken into: 1. 词法分析 (标记源代码) 2. 语法分析 (如果语法正确,请检查标记是否合法,tldr) for i in 'foo' ^ SyntaxError: invalid syntax We missed ':' 3. 语义分析 (上下文分析,合法语法仍然会触发错误,你是否尝试过除以0,哈希可变对象或使用未声明的函数?) 1/0 ZeroDivisionError: division by zero 这三个分析步骤负责错误处理。          第二步将负责错误,主要是语法错误,这是最常见的错误。     第三步将负责异常。          如我们所见,异常是语义错误,有许多内置的异常: ImportError ValueError KeyError FileNotFoundError IndentationError IndexError ... 你还可以具有用户定义的异常,这些异常必须直接或间接地从Exception类继承。 常见例子: class DividedBy2Error(Exception): def __init__(self, message): self.message = message def division(dividend,divisor): if divisor == 2: raise DividedBy2Error('I dont want you to divide by 2!') return dividend / divisor division(100, 2) >>> __main__.DividedBy2Error: I dont want you to divide by 2! ```
解释 异常处理以及如何在Python中使用它
编写一个可以恢复字符串的程序(例如,pizza -> azzip)
``` 最简单的是 str[::-1] 但不是效率最高的. "经典" 方式: foo = '' for char in 'pizza': foo = char + foo >> 'azzip' ```
编写一个函数以返回一个或多个数字的和。 用户将决定要使用多少个数字
首先,你询问用户要使用的数字量。 使用while循环,每个循环将amount_of_numbers减1,直到amount_of_numbers变为0。 在while循环中,你想询问用户一个数字,该数字将在每次循环运行时添加一个变量。 ``` def return_sum(): amount_of_numbers = int(input("How many numbers? ")) total_sum = 0 while amount_of_numbers != 0: num = int(input("Input a number. ")) total_sum += num amount_of_numbers -= 1 return total_sum ```
如何将两个排序列表合并为一个排序列表?
_ 在 Python 中用于什么?
1. i18n中的翻译查询 2. 将最后执行的表达式或语句的结果保存在交互式解释器中。 3. 作为通用“可丢弃”变量名。 例如:x,y,_ = get_data()(使用了x和y,但是由于我们不关心第三个变量,因此我们将其“扔掉了”)。
##### Algorithms Implementation
你可以在Python中实现“二分法搜索”吗?
##### Files
如何写文件?
``` with open('file.txt', 'w') as file: file.write("My insightful comment") ```
如何反转文件?
#### Regex
如何在Python中执行与正则表达式相关的操作? (匹配模式,替代字符串等)
使用 re 模式
如何用 "blue" 替换字符串 "green"?
如何找到一个变量中的所有IP地址? 如何在文件中找到它们?
按每个嵌套列表的第二项对列表列表进行排序
``` li = [[1, 4], [2, 1], [3, 9], [4, 2], [4, 5]] sorted(x, key=lambda l: l[1]) ```
你可以编写一个函数来打印给定目录中的所有文件吗? 包括子目录
你有下面的列表: [{'name': 'Mario', 'food': ['mushrooms', 'goombas']}, {'name': 'Luigi', 'food': ['mushrooms', 'turtles']}] 获取所有的食物类型,最后输出: {'mushrooms', 'goombas', 'turtles'}
``` brothers_menu = \ [{'name': 'Mario', 'food': ['mushrooms', 'goombas']}, {'name': 'Luigi', 'food': ['mushrooms', 'turtles']}] # "经典" 方式 def get_food(brothers_menu) -> set: temp = [] for brother in brothers_menu: for food in brother['food']: temp.append(food) return set(temp) # 一直先行方式 (Using list comprehension) set([food for bro in x for food in bro['food']]) ```
什么是List 加强? 它比典型的循环更好吗? 为什么? 你能示范如何使用它吗?
怎样反转 string?
最简短的方式是: my_string[::-1] 但是这不是效率最高的.
经典方式是: ``` def reverse_string(string): temp = "" for char in string: temp = char + temp return temp ```
如何按值对字典排序?
如何按键对字典排序?
解释数据序列化以及如何使用Python执行
你如何在Python中处理参数解析?
解释什么是GIL
什么是迭代器? 为什么使用迭代器?
解释以下方法的类型以及如何使用它们: * Static method * Class method * instance method
怎样反转 list?
空的 return 返回什么?
##### Time Complexity
描述操作的时间复杂度access, search insert and remove 下面的数据结构:
* Stack * Queue * Linked List * Binary Search Tree
以下算法的最好,最差和平均情况的复杂度是什么?: * Quicksort * Mergesort * Bucket Sort * Radix Sort
#### 高级
解释什么是装饰器
你能展示如何编写和使用装饰器吗?
编写脚本来确定给定端口上是否可以访问给定主机
这个查询熟悉数据类吗? 你能解释一下他们是干什么用的吗?
解释一下上下文管理
解释一下缓冲协议
解释一下描述符
你有抓取网络(爬虫)的经验吗? 你能描述一下你用过什么以及用什么?
你可以在Python中实现链接链表吗?
你已经创建了一个网页,用户可以在其中上传文档。 但是,根据文档大小,读取上传文件的功能会运行很长时间,并且用户必须等待读取操作完成才能继续使用该网站。 你怎么能解决这个问题?
## Prometheus #### 初级
什么是Prometheus? Prometheus的主要特点是什么?
描述 Prometheus 架构和组件
你能否将 Prometheus 与其他解决方案(例如InfluxDB)进行比较?
什么是an Alert?
描述以下Prometheus组件: * Prometheus server * Push Gateway * Alert Manager
负责抓取存储数据的Prometheus服务器推送网关用于短期作业警报管理负责警报 ;)
什么是一个实例? 什么是一个作业?
Prometheus支持哪些核心指标类型?
什么是一个 exporter? 它用来做什么?
你熟悉哪些Prometheus最佳做法? 至少命名三个
如何在给定时间内获得总请求?
#### 高级
你如何加入两个指标?
如何编写返回标签值的查询?
如何将cpu_user_seconds转换为cpu使用率(百分比)?
## Git #### 初级
git pullgit fetch的区别是什么?
简单来说, git pull = git fetch + git merge 当你运行git pull时,它会从远程或中央获取所有更改 存储库,并将其附加到本地存储库中的相应分支。 git fetch从远程存储库获取所有更改,将更改存储在 本地存储库中的单独分支
解释以下: git 目录, 工作目录暂存区
Git目录是Git存储项目的元数据和对象数据库的地方。 这是Git最重要的部分,当你从另一台计算机克隆存储库时,它就是复制的。 工作目录是项目一个版本的单个签出。 这些文件将从Git目录中的压缩数据库中拉出,并放置在磁盘上供你使用或修改。 暂存区是一个简单文件,通常包含在你的Git目录中,用于存储有关下一次提交的内容的信息。 有时称为索引,但将其称为暂存区已成为标准。 答案来自 [git-scm.com](https://git-scm.com/book/en/v1/Getting-Started-Git-Basics#_the_three_states)
怎么解决 git merge 冲突?

首先,打开有冲突的文件,然后确定有什么冲突。 接下来,根据你的公司或团队接受的是什么,你可以与自己的 同事解决冲突或自行解决 解决冲突后,使用 git add 添加文件。 最后,运行`git rebase --continue`。

git resetgit revert区别是什么?

`git revert` 创建一个新的提交,撤消上一次提交的更改。 `git reset` 根据使用情况,可以修改索引或更改分支头当前指向的提交。

你想将提交移至顶部。 你将如何实现?
使用 git rebase> 命令
那种情形你会使用 git rebase?
你熟悉哪些合并策略?
提及两个或三个就足够了,最好提到“递归”作为默认值。 recursive resolve ours theirs 这篇文章解释是最好的: https://git-scm.com/docs/merge-strategies
在提交更改之前,如何查看已完成的更改?
git diff
如何将特定文件还原为先前的提交?
``` git checkout HEAD~1 -- /path/of/the/file ```
#### 高级
解释 Git octopus merge
也许不错,它是: * 对于合并多个分支的情况(以及此类用例的默认情况)非常有用   * 主要用于将主题分支捆绑在一起 有一篇文章关于 Octopus merge: http://www.freblogg.com/2016/12/git-octopus-merge.html
## Go #### 初级
Go编程语言有哪些特点?
* 强类型和静态类型 - 变量的类型不能随时间更改,必须在编译时进行定义   * 简单   * 快速编译时间   * 内置并发   * 垃圾回收   * 平台无关   * 编译为独立的二进制文件 - 你运行应用程序所需的所有内容都将被编译为一个二进制文件。 对于运行时的版本管理非常有用。 Go 而且有一个很好的社区.
var x int = 2x := 2区别是什么?
结果相同,变量值为2。 with var x int = 2 we are setting the variable type to integer while with x := 2 we are letting Go figure out by itself the type.
对还是错? 在Go中,我们可以重新声明变量,并且一旦声明就必须使用它. 错. 我们不能重新声明变量,必须使用声明的变量。
你使用了哪些Go库?
应该根据你的使用情况回答此问题,一些示例是: * fmt - formatted I/O
下面代码块有什么问题? 怎么解决? ```go func main() { var x float32 = 13.5 var y int y = x } ```
下面的代码块尝试将整数101转换为字符串,但相反,我们得到“ e”。 这是为什么? 怎么解决? ```go package main import "fmt" func main() { var x int = 101 var y string y = string(x) fmt.Println(y) } ```
它看起来在101处设置了什么unicode值,并将其用于将整数转换为字符串。 如果要获取“ 101”,则应使用“ strconv” 软件包,然后替换 y = string(x) with y = strconv.Itoa(x)
以下代码块什么是错的?: ``` package main func main() { var x = 2 var y = 3 const someConst = x + y } ```
以下代码块的输出是什么?: ```go package main import "fmt" const ( x = iota y = iota ) const z = iota func main() { fmt.Printf("%v\n", x) fmt.Printf("%v\n", y) fmt.Printf("%v\n", z) } ```
_ 在 Go 中的用途是什么?
以下代码块的输出是什么?: ```go package main import "fmt" const ( _ = iota + 3 x ) func main() { fmt.Printf("%v\n", x) } ```
## Mongo #### 初级
MongoDB有什么优势? 换句话说,为什么选择 MongoDB 而不选择 NoSQL 的其他实现?
SQL和NoSQL之间的区别是什么?
主要区别在于SQL数据库是结构化的(数据以带有行和列的表格-像是Excel电子表格表格),而NoSQL是 非结构化的,并且数据存储会根据NoSQL DB的设置方式而有所不同,例如 作为键值对,面向文档等
在哪种情况下,这个查询希望使用 NoSQL/Mongo 而不是SQL?
* 经常变化的异构数据   * 数据一致性和完整性不是重中之重   * 最好,如果数据库需要快速扩展
什么是一个文档? 什么是一个集合?
什么是一个聚合?
那个更好? 嵌入文档还是引用?
##### Queries
解释这个查询: db.books.find({"name": /abc/})
解释这个查询: db.books.find().sort({x:1})
## OpenShift #### 初级
什么是OpenShift? 你用过吗? 如果有,是怎样使用的?
你能解释一下 OpenShift 和 Kubernetes 之间的区别吗?
定义 Pods 以及解释什么是有状态的 pods
你熟悉哪些类型的构建策略?
解释标签是什么以及它们的用途
解释什么是注释以及它们与标签的区别
解释什么是Downward API
## Shell 脚本 #### 初级
告诉我你使用Shell脚本的经验
脚本中的这一行是什么意思?: #!/bin/bash
你倾向于在编写的每个脚本中包含什么?
对还是错?: 当某个命令行失败时,默认情况下,该脚本将退出并且不会继续运行
取决于所使用的语言和设置,例如在Bash中,默认情况下,脚本将继续运行。
今天,我们拥有Ansible之类的工具和技术。 为什么还会有人使用Shell脚本?
说出下面每个命令的结果是什么: * echo $0 * echo $? * echo $$ * echo $@ * echo $#
你如何调试Shell脚本?
如何在Shell脚本中从用户获得输入?
解释一下条件语句以及如何使用它们
什么是循环? 你熟悉哪些类型的循环?
解释 continuebreak. 你什么时候使用它们?
如何将命令的输出存储在变量中?
你如何检查可变长度?
单引号和双引号之间的区别是什么?
#### 高级
解释以下代码: :(){ :|:& };:
你能举一些Bash最佳实践的例子吗?
什么是三元运算符? 你如何在bash中使用它?
使用 if/else 的一种简短方法。 一个例子: [[ $a = 1 ]] && b="yes, equal" || b="nope"
## SQL #### 初级
SQL 代表什么?
Structured Query Language(结构化查询语言)
SQL 和 NoSQL 有那些不同
主要区别在于SQL数据库是结构化的(数据以 带有行和列的表格-像是Excel电子表格表格),而NoSQL是 非结构化的,并且数据存储会根据NoSQL DB的设置方式而有所不同,例如 作为键值对,面向文档等
数据库符合ACID的含义是什么?
ACID代表原子性,一致性,隔离性,耐久性。为了符合ACID,数据库必须满足四个标准中的每个标准 **原子性** - 数据库发生更改时,它整体上应该成功或失败。 例如,如果你要更新表,则更新应完全执行。如果仅部分执行,则 更新被视为整体失败,并且不会通过-数据库将恢复为原始状态 更新发生之前的状态。还应该提到的是,原子性确保每个 事务以其自身的独立“单元”完成 - 如果任何部分失败,则整个语句都会失败。 **一致性** - 对数据库所做的任何更改都应将其从一种有效状态转变为另一种有效状态。 例如,如果你对数据库进行了更改,则不应破坏它。通过检查和约束来保持一致性 在数据库中预定义。例如,如果你尝试将列的值从字符串更改为int 应该是数据类型字符串,一致的数据库将不允许该事务通过,并且该操作将 不执行 **隔离** - 确保数据库不会被“更新中”-因为多个事务正在运行 同时,它仍应保持数据库处于与按顺序运行事务相同的状态。 例如,假设有20个人同时对数据库进行了更改。在 当你执行查询时,已完成20项更改中的15项,但仍有5项正在进行中。你应该 仅看到已完成的15个更改 - 随着更改的进行,你将看不到数据库的更新中。 **耐用性** - 更改一旦提交,无论发生什么情况都将保持提交状态 (电源故障,系统崩溃等)。这意味着所有已完成的交易 必须记录在非挥发性内存中。 请注意,SQL本质上符合ACID。某些NoSQL DB可能符合ACID,具体取决于 它们的工作方式,但是根据一般经验,NoSQL DB不被视为符合ACID
什么时候最好使用SQL/NoSQL?
SQL - 当数据完整性至关重要时,最适合使用。 由于符合ACID,SQL通常由许多业务实现特别是金融领域。 NoSQL - 非常适合你需要快速扩展的情况。 请记住NoSQL是为Web应用程序设计的 ,如果你需要快速将相同信息散布到多台服务器,它将会很好的用 此外,由于 NoSQL 不遵守具有列和行结构的严格表 关系数据库所要求的,你可以将不同的数据类型存储在一起。
什么是笛卡尔积?
笛卡尔积是指第一个表中的所有行都与第二个表中的所有行连接在一起时的结果 表。 这可以通过不定义要联接的键来隐式完成,也可以通过以下方式显式地完成: 在两个表上调用CROSS JOIN,如下所示: Select * from customers **CROSS JOIN** orders; 请注意,笛卡尔积也可能是一件坏事 - 执行联接时 在两个都没有唯一键的表上,这可能会导致返回信息 是不正确的。
##### SQL Specific Questions 对于这些问题,我们将使用下面显示的“客户和订单”表: **Customers** Customer_ID | Customer_Name | Items_in_cart | Cash_spent_to_Date ------------ | ------------- | ------------- | ------------- 100204 | John Smith | 0 | 20.00 100205 | Jane Smith | 3 | 40.00 100206 | Bobby Frank | 1 | 100.20 **ORDERS** Customer_ID | Order_ID | Item | Price | Date_sold ------------ | ------------- | ------------- | ------------- | ------------- 100206 | A123 | Rubber Ducky | 2.20 | 2019-09-18 100206 | A123 | Bubble Bath | 8.00 | 2019-09-18 100206 | Q987 | 80-Pack TP | 90.00 | 2019-09-20 100205 | Z001 | Cat Food - Tuna Fish | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Chicken | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Beef | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Kitty quesadilla | 10.00 | 2019-08-05 100204 | X202 | Coffee | 20.00 | 2019-04-29
我如何从该表中选择所有字段?
Select *
From Customers;
约翰的购物车中有几件?
Select Items_in_cart
From Customers
Where Customer_Name = "John Smith";
所有客户花费的所有现金的总和是多少?
Select SUM(Cash_spent_to_Date) as SUM_CASH
From Customers;
在购物车有商品的有多少人?
Select count(1) as Number_of_People_w_items
From Customers
where Items_in_cart > 0;
你如何将客户表加入订单表?
你可以加入他们的唯一键。 在这种情况下,唯一键为中的Customer_ID 客户表和订单表
你如何显示哪些客户订购了哪些物品?
Select c.Customer_Name, o.Item
From Customers c
Left Join Orders o
On c.Customer_ID = o.Customer_ID;
#### 高级
使用with语句,你将如何显示谁订购了猫粮以及花费的总金额?
with cat_food as (
Select Customer_ID, SUM(Price) as TOTAL_PRICE
From Orders
Where Item like "%Cat Food%"
Group by Customer_ID
)
Select Customer_name, TOTAL_PRICE
From Customers c
Inner JOIN cat_food f
ON c.Customer_ID = f.Customer_ID
where c.Customer_ID in (Select Customer_ID from cat_food); 尽管这是一个简单的声明,但“ with”子句在 在连接到另一个表之前,需要在一个表上运行一个复杂的查询。 用语句很好, 因为你在运行查询时会创建一个伪临时文件,而不是创建一个新表。 目前尚无法获得所有猫粮的总和,因此我们使用了with语句来创建 伪表检索每个客户花费的价格总和,然后正常加入该表。
## Azure #### 初级
解释一下可用性集和可用性区域
什么是Azure资源管理器? 你可以描述ARM模板的格式吗?
解释一下Azure托管磁盘
## GCP #### 初级
GCP的主要组件和服务是什么?
你熟悉哪些GCP管理工具?
告诉我对GCP联网了解多少
## OpenStack #### 初级
告诉我你使用OpenStack的经验。 你认为OpenStack的优缺点是什么?
你熟悉OpenStack的哪些组件/项目?
你能告诉我以下每个组件/项目负责什么吗?: * Nova * Neutron * Cinder * Glance * Keystone
详细描述如何使用可以从云外部访问的IP来启动实例
你收到客户打来的电话,说:“我可以ping我的实例,但不能连接(ssh)它”。 可能是什么问题?
OpenStack支持哪些类型的网络?
你如何调试OpenStack存储问题? (工具,日志等)
你如何调试OpenStack计算问题? (工具,日志等)
你熟悉 TripleO吗? 它有那些优点?
##### 网络
什么是供应商网络?
L2和L3中存在哪些组件和服务?
什么是ML2 plug-in? 解释一下它的架构
什么是L2 代理? 它是怎么工作的以及它主要负责什么?
什么是L3 代理? 它是怎么工作的以及它主要负责什么?
解释元数据代理是怎么工作的以及它主要负责什么
你如何调试OpenStack网络问题? (工具,日志等)
#### 中级 ##### 网络
解释 BGP 动态路由
## 安全 #### 初级
你能描述一下DevSecOps的核心原理吗?
你熟悉哪些DevOps安全最佳实践?
你熟悉哪些安全技术?
如何在不同的工具和平台中管理密码?
你如何识别和管理漏洞?
什么是权限限制?
## Puppet #### 初级
什么是Puppet? 它是怎么工作的?
解释一下 Puppet 结构
你可以将Puppet与其他配置管理工具进行比较吗? 你为什么选择使用Puppet?
解释以下: * Module * Manifest * Node
解释一下Facter
什么是MCollective?
#### 中级
你有编写模块的经验吗? 你创建了哪个模块以及用于什么?
解释一下什么是Hiera
## 场景 方案是没有口头回答的问题,需要你满足以下条件之一: * 设置环境 * 编写脚本 * 设计和/或开发基础设施项目 这些问题通常作为应聘者的一项家庭任务作为候选,可以将多个主题结合在一起。 在下面,你可以找到一些场景问题: * [Elasticsearch & Kibana on AWS](scenarios/elk_kibana_aws.md) * [Ansible, Minikube and Docker](scenarios/ansible_minikube_docker.md) * [Cloud Slack bot](scenarios/cloud_slack_bot.md) * [Writing Jenkins Scripts](scenarios/jenkins_scripts.md) * [Writing Jenkins Pipelines](scenarios/jenkins_pipelines.md) ================================================ FILE: README.md ================================================

:information_source:  This repo contains questions and exercises on various technical topics, sometimes related to DevOps and SRE :bar_chart:  There are currently **2624** exercises and questions :warning:  You can use these for preparing for an interview but most of the questions and exercises don't represent an actual interview. Please read [FAQ page](faq.md) for more details :stop_sign:  If you are interested in pursuing a career as DevOps engineer, learning some of the concepts mentioned here would be useful, but you should know it's not about learning all the topics and technologies mentioned in this repository :pencil:  You can add more exercises by submitting pull requests :) Read about contribution guidelines [here](CONTRIBUTING.md) ****
DevOps
DevOps
Git
Git
Network
Network
Hardware
Hardware
kubernetes
Kubernetes
programming
Software Development
Python
Python
go
Go
perl
Perl
RegEx
Regex
Cloud
Cloud
aws
AWS
azure
Azure
Google Cloud Platform
Google Cloud Platform
openstack
OpenStack
Operating System
Operating System
Linux
Linux
Virtualization
Virtualization
DNS
DNS
Bash
Shell Scripting
Databases
Databases
sql
SQL
Mongo
Mongo
Testing
Testing
Big Data
Big Data
cicd
CI/CD
Certificates
Certificates
Containers
Containers
OpenShift
OpenShift
Storage
Storage
Terraform
Terraform
puppet
Puppet
Distributed
Distributed
you
Questions you can ask
ansible
Ansible
observability
Observability
Prometheus
Prometheus
Circle CI
Circle CI
DataDog
Grafana
Grafana
Argo
Argo
HR
Soft Skills
security
Security
Design
System Design
Chaos Engineering
Chaos Engineering
Misc
Misc
Elastic
Elastic
Kafka
Kafka
NodeJs
NodeJs
## DevOps Applications
KubePrep
KubePrep
Linux Master
Linux Master
Sytem Design Hero
System Design Hero
## Network
In general, what do you need in order to communicate?
- A common language (for the two ends to understand) - A way to address who you want to communicate with - A Connection (so the content of the communication can reach the recipients)
What is TCP/IP?
A set of protocols that define how two or more devices can communicate with each other. To learn more about TCP/IP, read [here](http://www.penguintutor.com/linux/basic-network-reference)
What is Ethernet?
Ethernet simply refers to the most common type of Local Area Network (LAN) used today. A LAN—in contrast to a WAN (Wide Area Network), which spans a larger geographical area—is a connected network of computers in a small area, like your office, college campus, or even home.
What is a MAC address? What is it used for?
A MAC address is a unique identification number or code used to identify individual devices on the network. Packets that are sent on the ethernet are always coming from a MAC address and sent to a MAC address. If a network adapter is receiving a packet, it is comparing the packet’s destination MAC address to the adapter’s own MAC address.
When is this MAC address used?: ff:ff:ff:ff:ff:ff
When a device sends a packet to the broadcast MAC address (FF:FF:FF:FF:FF:FF​), it is delivered to all stations on the local network. Ethernet broadcasts are used to resolve IP addresses to MAC addresses (by ARP) at the data link layer.
What is an IP address?
An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication.An IP address serves two main functions: host or network interface identification and location addressing.
Explain the subnet mask and give an example
A Subnet mask is a 32-bit number that masks an IP address and divides the IP addresses into network addresses and host addresses. Subnet Mask is made by setting network bits to all "1"s and setting host bits to all "0"s. Within a given network, out of the total usable host addresses, two are always reserved for specific purposes and cannot be allocated to any host. These are the first address, which is reserved as a network address (a.k.a network ID), and the last address used for network broadcast. [Example](https://github.com/philemonnwanne/projects/tree/main/exercises/exe-09)
What is a private IP address? In which scenarios/system designs, one should use it?
Private IP addresses are assigned to the hosts in the same network to communicate with one another. As the name "private" suggests, the devices having the private IP addresses assigned can't be reached by the devices from any external network. For example, if I am living in a hostel and I want my hostel mates to join the game server I have hosted, I will ask them to join via my server's private IP address, since the network is local to the hostel.
What is a public IP address? In which scenarios/system designs, one should use it?
A public IP address is a public-facing IP address. In the event that you were hosting a game server that you want your friends to join, you will give your friends your public IP address to allow their computers to identify and locate your network and server in order for the connection to take place. One time that you would not need to use a public-facing IP address is in the event that you were playing with friends who were connected to the same network as you, in that case, you would use a private IP address. In order for someone to be able to connect to your server that is located internally, you will have to set up a port forward to tell your router to allow traffic from the public domain into your network and vice versa.
Explain the OSI model. What layers there are? What each layer is responsible for?
- Application: user end (HTTP is here) - Presentation: establishes context between application-layer entities (Encryption is here) - Session: establishes, manages, and terminates the connections - Transport: transfers variable-length data sequences from a source to a destination host (TCP & UDP are here) - Network: transfers datagrams from one network to another (IP is here) - Data link: provides a link between two directly connected nodes (MAC is here) - Physical: the electrical and physical spec of the data connection (Bits are here) You can read more about the OSI model in [penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference)
For each of the following determines to which OSI layer it belongs: * Error correction * Packets routing * Cables and electrical signals * MAC address * IP address * Terminate connections * 3 way handshake
* Error correction - Data link * Packets routing - Network * Cables and electrical signals - Physical * MAC address - Data link * IP address - Network * Terminate connections - Session * 3-way handshake - Transport
What delivery schemes are you familiar with?
Unicast: One-to-one communication where there is one sender and one receiver. Broadcast: Sending a message to everyone in the network. The address ff:ff:ff:ff:ff:ff is used for broadcasting. Two common protocols which use broadcast are ARP and DHCP. Multicast: Sending a message to a group of subscribers. It can be one-to-many or many-to-many.
What is CSMA/CD? Is it used in modern ethernet networks?
CSMA/CD stands for Carrier Sense Multiple Access / Collision Detection. Its primary focus is to manage access to a shared medium/bus where only one host can transmit at a given point in time. CSMA/CD algorithm: 1. Before sending a frame, it checks whether another host is already transmitting a frame. 2. If no one is transmitting, it starts transmitting the frame. 3. If two hosts transmit at the same time, we have a collision. 4. Both hosts stop sending the frame and they send everyone a 'jam signal' notifying everyone that a collision occurred 5. They are waiting for a random time before sending it again 6. Once each host waited for a random time, they try to send the frame again and so the cycle starts again
Describe the following network devices and the difference between them: * router * switch * hub
A router, switch, and hub are all network devices used to connect devices in a local area network (LAN). However, each device operates differently and has its specific use cases. Here is a brief description of each device and the differences between them: 1. Router: a network device that connects multiple network segments together. It operates at the network layer (Layer 3) of the OSI model and uses routing protocols to direct data between networks. Routers use IP addresses to identify devices and route data packets to the correct destination. 2. Switch: a network device that connects multiple devices on a LAN. It operates at the data link layer (Layer 2) of the OSI model and uses MAC addresses to identify devices and direct data packets to the correct destination. Switches allow devices on the same network to communicate with each other more efficiently and can prevent data collisions that can occur when multiple devices send data simultaneously. 3. Hub: a network device that connects multiple devices through a single cable and is used to connect multiple devices without segmenting a network. However, unlike a switch, it operates at the physical layer (Layer 1) of the OSI model and simply broadcasts data packets to all devices connected to it, regardless of whether the device is the intended recipient or not. This means that data collisions can occur, and the network's efficiency can suffer as a result. Hubs are generally not used in modern network setups, as switches are more efficient and provide better network performance.
What is a "Collision Domain"?
A collision domain is a network segment in which devices can potentially interfere with each other by attempting to transmit data at the same time. When two devices transmit data at the same time, it can cause a collision, resulting in lost or corrupted data. In a collision domain, all devices share the same bandwidth, and any device can potentially interfere with the transmission of data by other devices.
What is a "Broadcast Domain"?
A broadcast domain is a network segment in which all devices can communicate with each other by sending broadcast messages. A broadcast message is a message that is sent to all devices in a network rather than a specific device. In a broadcast domain, all devices can receive and process broadcast messages, regardless of whether the message was intended for them or not.
three computers connected to a switch. How many collision domains are there? How many broadcast domains?
Three collision domains and one broadcast domain
How does a router work?
A router is a physical or virtual appliance that passes information between two or more packet-switched computer networks. A router inspects a given data packet's destination Internet Protocol address (IP address), calculates the best way for it to reach its destination, and then forwards it accordingly.
What is NAT?
Network Address Translation (NAT) is a process in which one or more local IP addresses are translated into one or more Global IP address and vice versa in order to provide Internet access to the local hosts.
What is a proxy? How does it work? What do we need it for?
A proxy server acts as a gateway between you and the internet. It’s an intermediary server separating end users from the websites they browse. If you’re using a proxy server, internet traffic flows through the proxy server on its way to the address you requested. The request then comes back through that same proxy server (there are exceptions to this rule), and then the proxy server forwards the data received from the website to you. Proxy servers provide varying levels of functionality, security, and privacy depending on your use case, needs, or company policy.
What is TCP? How does it work? What is the 3-way handshake?
TCP 3-way handshake or three-way handshake is a process that is used in a TCP/IP network to make a connection between server and client. A three-way handshake is primarily used to create a TCP socket connection. It works when: - A client node sends an SYN data packet over an IP network to a server on the same or an external network. The objective of this packet is to ask/infer if the server is open for new connections. - The target server must have open ports that can accept and initiate new connections. When the server receives the SYN packet from the client node, it responds and returns a confirmation receipt – the ACK packet or SYN/ACK packet. - The client node receives the SYN/ACK from the server and responds with an ACK packet.
What is round-trip delay or round-trip time?
From [wikipedia](https://en.wikipedia.org/wiki/Round-trip_delay): "the length of time it takes for a signal to be sent plus the length of time it takes for an acknowledgment of that signal to be received" Bonus question: what is the RTT of LAN?
How does an SSL handshake work?
SSL handshake is a process that establishes a secure connection between a client and a server. 1. The client sends a Client Hello message to the server, which includes the client's version of the SSL/TLS protocol, a list of the cryptographic algorithms supported by the client, and a random value. 2. The server responds with a Server Hello message, which includes the server's version of the SSL/TLS protocol, a random value, and a session ID. 3. The server sends a Certificate message, which contains the server's certificate. 4. The server sends a Server Hello Done message, which indicates that the server is done sending messages for the Server Hello phase. 5. The client sends a Client Key Exchange message, which contains the client's public key. 6. The client sends a Change Cipher Spec message, which notifies the server that the client is about to send a message encrypted with the new cipher spec. 7. The client sends an Encrypted Handshake Message, which contains the pre-master secret encrypted with the server's public key. 8. The server sends a Change Cipher Spec message, which notifies the client that the server is about to send a message encrypted with the new cipher spec. 9. The server sends an Encrypted Handshake Message, which contains the pre-master secret encrypted with the client's public key. 10. The client and server can now exchange application data.
What is the difference between TCP and UDP?
TCP establishes a connection between the client and the server to guarantee the order of the packages, on the other hand, UDP does not establish a connection between the client and server and doesn't handle package orders. This makes UDP more lightweight than TCP and a perfect candidate for services like streaming. [Penguintutor.com](http://www.penguintutor.com/linux/basic-network-reference) provides a good explanation.
What TCP/IP protocols are you familiar with?
Explain the "default gateway"
A default gateway serves as an access point or IP router that a networked computer uses to send information to a computer in another network or the internet.
What is ARP? How does it work?
ARP stands for Address Resolution Protocol. When you try to ping an IP address on your local network, say 192.168.1.1, your system has to turn the IP address 192.168.1.1 into a MAC address. This involves using ARP to resolve the address, hence its name. Systems keep an ARP look-up table where they store information about what IP addresses are associated with what MAC addresses. When trying to send a packet to an IP address, the system will first consult this table to see if it already knows the MAC address. If there is a value cached, ARP is not used.
What is TTL? What does it help to prevent?
- TTL (Time to Live) is a value in an IP (Internet Protocol) packet that determines how many hops or routers a packet can travel before it is discarded. Each time a packet is forwarded by a router, the TTL value is decreased by one. When the TTL value reaches zero, the packet is dropped, and an ICMP (Internet Control Message Protocol) message is sent back to the sender indicating that the packet has expired. - TTL is used to prevent packets from circulating indefinitely in the network, which can cause congestion and degrade network performance. - It also helps to prevent packets from being trapped in routing loops, where packets continuously travel between the same set of routers without ever reaching their destination. - In addition, TTL can be used to help detect and prevent IP spoofing attacks, where an attacker attempts to impersonate another device on the network by using a false or fake IP address. By limiting the number of hops that a packet can travel, TTL can help prevent packets from being routed to destinations that are not legitimate.
What is DHCP? How does it work?
It stands for Dynamic Host Configuration Protocol and allocates IP addresses, subnet masks, and gateways to hosts. This is how it works: * A host upon entering a network broadcasts a message in search of a DHCP server (DHCP DISCOVER) * An offer message is sent back by the DHCP server as a packet containing lease time, subnet mask, IP addresses, etc (DHCP OFFER) * Depending on which offer is accepted, the client sends back a reply broadcast letting all DHCP servers know (DHCP REQUEST) * The server sends an acknowledgment (DHCP ACK) Read more [here](https://linuxjourney.com/lesson/dhcp-overview)
Can you have two DHCP servers on the same network? How does it work?
It is possible to have two DHCP servers on the same network, however, it is not recommended, and it is important to configure them carefully to prevent conflicts and configuration problems. - When two DHCP servers are configured on the same network, there is a risk that both servers will assign IP addresses and other network configuration settings to the same device, which can cause conflicts and connectivity issues. Additionally, if the DHCP servers are configured with different network settings or options, devices on the network may receive conflicting or inconsistent configuration settings. - However, in some cases, it may be necessary to have two DHCP servers on the same network, such as in large networks where one DHCP server may not be able to handle all the requests. In such cases, DHCP servers can be configured to serve different IP address ranges or different subnets, so they do not interfere with each other.
What is SSL tunneling? How does it work?
- SSL (Secure Sockets Layer) tunneling is a technique used to establish a secure, encrypted connection between two endpoints over an insecure network, such as the Internet. The SSL tunnel is created by encapsulating the traffic within an SSL connection, which provides confidentiality, integrity, and authentication. Here's how SSL tunneling works: 1. A client initiates an SSL connection to a server, which involves a handshake process to establish the SSL session. 2. Once the SSL session is established, the client and server negotiate encryption parameters, such as the encryption algorithm and key length, then exchange digital certificates to authenticate each other. 3. The client then sends traffic through the SSL tunnel to the server, which decrypts the traffic and forwards it to its destination. 4. The server sends traffic back through the SSL tunnel to the client, which decrypts the traffic and forwards it to the application.
What is a socket? Where can you see the list of sockets in your system?
- A socket is a software endpoint that enables two-way communication between processes over a network. Sockets provide a standardized interface for network communication, allowing applications to send and receive data across a network. To view the list of open sockets on a Linux system:  ***netstat -an*** - This command displays a list of all open sockets, along with their protocol, local address, foreign address, and state.
What is IPv6? Why should we consider using it if we have IPv4?
- IPv6 (Internet Protocol version 6) is the latest version of the Internet Protocol (IP), which is used to identify and communicate with devices on a network. IPv6 addresses are 128-bit addresses and are expressed in hexadecimal notation, such as 2001:0db8:85a3:0000:0000:8a2e:0370:7334. There are several reasons why we should consider using IPv6 over IPv4: 1. Address space: IPv4 has a limited address space, which has been exhausted in many parts of the world. IPv6 provides a much larger address space, allowing for trillions of unique IP addresses. 2. Security: IPv6 includes built-in support for IPsec, which provides end-to-end encryption and authentication for network traffic. 3. Performance: IPv6 includes features that can help to improve network performance, such as multicast routing, which allows a single packet to be sent to multiple destinations simultaneously. 4. Simplified network configuration: IPv6 includes features that can simplify network configuration, such as stateless autoconfiguration, which allows devices to automatically configure their own IPv6 addresses without the need for a DHCP server. 5. Better mobility support: IPv6 includes features that can improve mobility support, such as Mobile IPv6, which allows devices to maintain their IPv6 addresses as they move between different networks.
What is VLAN?
- A VLAN (Virtual Local Area Network) is a logical network that groups together a set of devices on a physical network, regardless of their physical location. VLANs are created by configuring network switches to assign a specific VLAN ID to frames sent by devices connected to a specific port or group of ports on the switch.
What is MTU?
MTU stands for Maximum Transmission Unit. It's the size of the largest PDU (protocol Data Unit) that can be sent in a single transaction.
What happens if you send a packet that is bigger than the MTU?
With the IPv4 protocol, the router can fragment the PDU and then send all the fragmented PDU through the transaction. With IPv6 protocol, it issues an error to the user's computer.
True or False? Ping is using UDP because it doesn't care about reliable connection
False. Ping is actually using ICMP (Internet Control Message Protocol) which is a network protocol used to send diagnostic messages and control messages related to network communication.
What is SDN?
- SDN stands for Software-Defined Networking. It is an approach to network management that emphasizes the centralization of network control, enabling administrators to manage network behavior through a software abstraction. - In a traditional network, network devices such as routers, switches, and firewalls are configured and managed individually, using specialized software or command-line interfaces. In contrast, SDN separates the network control plane from the data plane, allowing administrators to manage network behavior through a centralized software controller.
What is ICMP? What is it used for?
- ICMP stands for Internet Control Message Protocol. It is a protocol used for diagnostic and control purposes in IP networks. It is a part of the Internet Protocol suite, operating at the network layer. ICMP messages are used for a variety of purposes, including: 1. Error reporting: ICMP messages are used to report errors that occur in the network, such as a packet that could not be delivered to its destination. 2. Ping: ICMP is used to send ping messages, which are used to test whether a host or network is reachable and to measure the round-trip time for packets. 3. Path MTU discovery: ICMP is used to discover the Maximum Transmission Unit (MTU) of a path, which is the largest packet size that can be transmitted without fragmentation. 4. Traceroute: ICMP is used by the traceroute utility to trace the path that packets take through the network. 5. Router discovery: ICMP is used to discover the routers in a network.
What is NAT? How does it work?
NAT stands for Network Address Translation. It’s a way to map multiple local private addresses to a public one before transferring the information. Organizations that want multiple devices to employ a single IP address use NAT, as do most home routers. For example, your computer's private IP could be 192.168.1.100, but your router maps the traffic to its public IP (e.g. 1.1.1.1). Any device on the internet would see the traffic coming from your public IP (1.1.1.1) instead of your private IP (192.168.1.100).
Which port number is used in each of the following protocols?: * SSH * SMTP * HTTP * DNS * HTTPS * FTP * SFTP
* SSH - 22 * SMTP - 25 * HTTP - 80 * DNS - 53 * HTTPS - 443 * FTP - 21 * SFTP - 22
Which factors affect network performance?
Several factors can affect network performance, including: 1. Bandwidth: The available bandwidth of a network connection can significantly impact its performance. Networks with limited bandwidth can experience slow data transfer rates, high latency, and poor responsiveness. 2. Latency: Latency refers to the delay that occurs when data is transmitted from one point in a network to another. High latency can result in slow network performance, especially for real-time applications like video conferencing and online gaming. 3. Network congestion: When too many devices are using a network at the same time, network congestion can occur, leading to slow data transfer rates and poor network performance. 4. Packet loss: Packet loss occurs when packets of data are dropped during transmission. This can result in slower network speeds and lower overall network performance. 5. Network topology: The physical layout of a network, including the placement of switches, routers, and other network devices, can impact network performance. 6. Network protocol: Different network protocols have different performance characteristics, which can impact network performance. For example, TCP is a reliable protocol that can guarantee the delivery of data, but it can also result in slower performance due to the overhead required for error checking and retransmission. 7. Network security: Security measures such as firewalls and encryption can impact network performance, especially if they require significant processing power or introduce additional latency. 8. Distance: The physical distance between devices on a network can impact network performance, especially for wireless networks where signal strength and interference can affect connectivity and data transfer rates.
What is APIPA?
APIPA is a set of IP addresses that devices are allocated when the main DHCP server is not reachable
What IP range does APIPA use?
APIPA uses the IP range: 169.254.0.1 - 169.254.255.254.
#### Control Plane and Data Plane
What does "control plane" refer to?
The control plane is a part of the network that decides how to route and forward packets to a different location.
What does "data plane" refer to?
The data plane is a part of the network that actually forwards the data/packets.
What does "management plane" refer to?
It refers to monitoring and management functions.
To which plane (data, control, ...) does creating routing tables belong to?
Control Plane.
Explain Spanning Tree Protocol (STP).
What is link aggregation? Why is it used?
What is Asymmetric Routing? How to deal with it?
What overlay (tunnel) protocols are you familiar with?
What is GRE? How does it work?
What is VXLAN? How does it work?
What is SNAT?
Explain OSPF.
OSPF (Open Shortest Path First) is a routing protocol that can be implemented on various types of routers. In general, OSPF is supported on most modern routers, including those from vendors such as Cisco, Juniper, and Huawei. The protocol is designed to work with IP-based networks, including both IPv4 and IPv6. Also, it uses a hierarchical network design, where routers are grouped into areas, with each area having its own topology map and routing table. This design helps to reduce the amount of routing information that needs to be exchanged between routers and improve network scalability. The OSPF 4 Types of routers are: * Internal Router * Area Border Routers * Autonomous Systems Boundary Routers * Backbone Routers Learn more about OSPF router types: https://www.educba.com/ospf-router-types/
What is latency?
Latency is the time taken for information to reach its destination from the source.
What is bandwidth?
Bandwidth is the capacity of a communication channel to measure how much data the latter can handle over a specific time period. More bandwidth would imply more traffic handling and thus more data transfer.
What is throughput?
Throughput refers to the measurement of the real amount of data transferred over a certain period of time across any transmission channel.
When performing a search query, what is more important, latency or throughput? And how to ensure that we manage global infrastructure?
Latency. To have good latency, a search query should be forwarded to the closest data center.
When uploading a video, what is more important, latency or throughput? And how to assure that?
Throughput. To have good throughput, the upload stream should be routed to an underutilized link.
What other considerations (except latency and throughput) are there when forwarding requests?
* Keep caches updated (which means the request could be forwarded not to the closest data center)
Explain Spine & Leaf
"Spine & Leaf" is a networking topology commonly used in data center environments to connect multiple switches and manage network traffic efficiently. It is also known as "spine-leaf" architecture or "leaf-spine" topology. This design provides high bandwidth, low latency, and scalability, making it ideal for modern data centers handling large volumes of data and traffic. Within a Spine & Leaf network there are two main tipology of switches: * Spine Switches: Spine switches are high-performance switches arranged in a spine layer. These switches act as the core of the network and are typically interconnected with each leaf switch. Each spine switch is connected to all the leaf switches in the data center. * Leaf Switches: Leaf switches are connected to end devices like servers, storage arrays, and other networking equipment. Each leaf switch is connected to every spine switch in the data center. This creates a non-blocking, full-mesh connectivity between leaf and spine switches, ensuring any leaf switch can communicate with any other leaf switch with maximum throughput. The Spine & Leaf architecture has become increasingly popular in data centers due to its ability to handle the demands of modern cloud computing, virtualization, and big data applications, providing a scalable, high-performance, and reliable network infrastructure
What is Network Congestion? What can cause it?
Network congestion occurs when there is too much data to transmit on a network and it doesn't have enough capacity to handle the demand.
This can lead to increased latency and packet loss. The causes can be multiple, such as high network usage, large file transfers, malware, hardware issues, or network design problems.
To prevent network congestion, it's important to monitor your network usage and implement strategies to limit or manage the demand.
What can you tell me about the UDP packet format? What about the TCP packet format? How is it different?
What is the exponential backoff algorithm? Where is it used?
Using Hamming code, what would be the code word for the following data word 100111010001101?
00110011110100011101
Give examples of protocols found in the application layer
* Hypertext Transfer Protocol (HTTP) - used for the webpages on the internet * Simple Mail Transfer Protocol (SMTP) - email transmission * Telecommunications Network - (TELNET) - terminal emulation to allow a client access to a telnet server * File Transfer Protocol (FTP) - facilitates the transfer of files between any two machines * Domain Name System (DNS) - domain name translation * Dynamic Host Configuration Protocol (DHCP) - allocates IP addresses, subnet masks, and gateways to hosts * Simple Network Management Protocol (SNMP) - gathers data on devices on the network
Give examples of protocols found in the Network Layer
* Internet Protocol (IP) - assists in routing packets from one machine to another * Internet Control Message Protocol (ICMP) - lets one know what is going such as error messages and debugging information
What is HSTS?
HTTP Strict Transport Security is a web server directive that informs user agents and web browsers how to handle its connection through a response header sent at the very beginning and back to the browser. This forces connections over HTTPS encryption, disregarding any script's call to load any resource in that domain over HTTP. Read more [here](https://www.globalsign.com/en/blog/what-is-hsts-and-how-do-i-use-it#:~:text=HTTP%20Strict%20Transport%20Security%20(HSTS,and%20back%20to%20the%20browser.)
#### Network - Misc
What is the Internet? Is it the same as the World Wide Web?
The internet refers to a network of networks, transferring huge amounts of data around the globe.
The World Wide Web is an application running on millions of servers, on top of the internet, accessed through what is known as the web browser
What is the ISP?
ISP (Internet Service Provider) is the local internet company provider.
## Operating System ### Operating System Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| |Fork 101|Fork|[Link](topics/os/fork_101.md)|[Link](topics/os/solutions/fork_101_solution.md) |Fork 102|Fork|[Link](topics/os/fork_102.md)|[Link](topics/os/solutions/fork_102_solution.md) ### Operating System - Self Assessment
What is an operating system?
From the book "Operating Systems: Three Easy Pieces": "responsible for making it easy to run programs (even allowing you to seemingly run many at the same time), allowing programs to share memory, enabling programs to interact with devices, and other fun stuff like that".
#### Operating System - Process
Can you explain what is a process?
A process is a running program. A program is one or more instructions and the program (or process) is executed by the operating system.
If you had to design an API for processes in an operating system, what would this API look like?
It would support the following: * Create - allow to create new processes * Delete - allow to remove/destroy processes * State - allow to check the state of the process, whether it's running, stopped, waiting, etc. * Stop - allow to stop a running process
How a process is created?
* The OS is reading program's code and any additional relevant data * Program's code is loaded into the memory or more specifically, into the address space of the process. * Memory is allocated for program's stack (aka run-time stack). The stack also initialized by the OS with data like argv, argc and parameters to main() * Memory is allocated for program's heap which is required for dynamically allocated data like the data structures linked lists and hash tables * I/O initialization tasks are performed, like in Unix/Linux based systems, where each process has 3 file descriptors (input, output and error) * OS is running the program, starting from main()
True or False? The loading of the program into the memory is done eagerly (all at once)
False. It was true in the past but today's operating systems perform lazy loading, which means only the relevant pieces required for the process to run are loaded first.
What are different states of a process?
* Running - it's executing instructions * Ready - it's ready to run, but for different reasons it's on hold * Blocked - it's waiting for some operation to complete, for example I/O disk request
What are some reasons for a process to become blocked?
- I/O operations (e.g. Reading from a disk) - Waiting for a packet from a network
What is Inter Process Communication (IPC)?
Inter-process communication (IPC) refers to the mechanisms provided by an operating system that allow processes to manage shared data.
What is "time sharing"?
Even when using a system with one physical CPU, it's possible to allow multiple users to work on it and run programs. This is possible with time sharing, where computing resources are shared in a way it seems to the user, the system has multiple CPUs, but in fact it's simply one CPU shared by applying multiprogramming and multi-tasking.
What is "space sharing"?
Somewhat the opposite of time sharing. While in time sharing a resource is used for a while by one entity and then the same resource can be used by another resource, in space sharing the space is shared by multiple entities but in a way where it's not being transferred between them.
It's used by one entity, until this entity decides to get rid of it. Take for example storage. In storage, a file is yours, until you decide to delete it.
What component determines which process runs at a given moment in time?
CPU scheduler
#### Operating System - Memory
What is "virtual memory" and what purpose does serve?
Virtual memory combines your computer's RAM with temporary space on your hard disk. When RAM runs low, virtual memory helps to move data from RAM to a space called a paging file. Moving data to paging file can free up the RAM, so your computer can complete its work. In general, the more RAM your computer has, the faster the programs run. https://www.minitool.com/lib/virtual-memory.html
What is demand paging?
Demand paging is a memory management technique where pages are loaded into physical memory only when accessed by a process. It optimizes memory usage by loading pages on demand, reducing startup latency and space overhead. However, it introduces some latency when accessing pages for the first time. Overall, it’s a cost-effective approach for managing memory resources in operating systems.
What is copy-on-write?
Copy-on-write (COW) is a resource management concept, with the goal to reduce unnecessary copying of information. It is a concept, which is implemented for instance within the POSIX fork syscall, which creates a duplicate process of the calling process. The idea: 1. If resources are shared between 2 or more entities (for example shared memory segments between 2 processes), the resources don't need to be copied for every entity, but rather every entity has a READ operation access permission on the shared resource. (the shared segments are marked as read-only) (Think of every entity having a pointer to the location of the shared resource, which can be dereferenced to read its value) 2. If one entity would perform a WRITE operation on a shared resource, a problem would arise, since the resource also would be permanently changed for ALL other entities sharing it. (Think of a process modifying some variables on the stack, or allocatingy some data dynamically on the heap, these changes to the shared resource would also apply for ALL other processes, this is definitely an undesirable behaviour) 3. As a solution only, if a WRITE operation is about to be performed on a shared resource, this resource gets COPIED first and then the changes are applied.
What is a kernel, and what does it do?
The kernel is part of the operating system and is responsible for tasks like: * Allocating memory * Schedule processes * Control CPU
True or False? Some pieces of the code in the kernel are loaded into protected areas of the memory so applications can't overwrite them.
True
What is POSIX?
POSIX (Portable Operating System Interface) is a set of standards that define the interface between a Unix-like operating system and application programs.
Explain what Semaphore is and what its role in operating systems.
A semaphore is a synchronization primitive used in operating systems and concurrent programming to control access to shared resources. It's a variable or abstract data type that acts as a counter or a signaling mechanism for managing access to resources by multiple processes or threads.
What is cache? What is buffer?
Cache: Cache is usually used when processes are reading and writing to the disk to make the process faster, by making similar data used by different programs easily accessible. Buffer: Reserved place in RAM, which is used to hold data for temporary purposes.
## Virtualization
What is Virtualization?
Virtualization uses software to create an abstraction layer over computer hardware, that allows the hardware elements of a single computer - processors, memory, storage and more - to be divided into multiple virtual computers, commonly called virtual machines (VMs).
What is a hypervisor?
Red Hat: "A hypervisor is software that creates and runs virtual machines (VMs). A hypervisor, sometimes called a virtual machine monitor (VMM), isolates the hypervisor operating system and resources from the virtual machines and enables the creation and management of those VMs." Read more [here](https://www.redhat.com/en/topics/virtualization/what-is-a-hypervisor)
What types of hypervisors are there?
Hosted hypervisors and bare-metal hypervisors.
What are the advantages and disadvantages of bare-metal hypervisor over a hosted hypervisor?
Due to having its own drivers and a direct access to hardware components, a baremetal hypervisor will often have better performances along with stability and scalability. On the other hand, there will probably be some limitation regarding loading (any) drivers so a hosted hypervisor will usually benefit from having a better hardware compatibility.
What types of virtualization are there?
Operating system virtualization Network functions virtualization Desktop virtualization
Is containerization a type of Virtualization?
Yes, it's a operating-system-level virtualization, where the kernel is shared and allows to use multiple isolated user-spaces instances.
How the introduction of virtual machines changed the industry and the way applications were deployed?
The introduction of virtual machines allowed companies to deploy multiple business applications on the same hardware, while each application is separated from each other in secured way, where each is running on its own separate operating system.
#### Virtual Machines
Do we need virtual machines in the age of containers? Are they still relevant?
Yes, virtual machines are still relevant even in the age of containers. While containers provide a lightweight and portable alternative to virtual machines, they do have certain limitations. Virtual machines still matter because they offer isolation and security, can run different operating systems, and are good for legacy apps. Containers limitations for example are sharing the host kernel.
## Prometheus
What is Prometheus? What are some of Prometheus's main features?
Prometheus is a popular open-source systems monitoring and alerting toolkit, originally developed at SoundCloud. It is designed to collect and store time-series data, and to allow for querying and analysis of that data using a powerful query language called PromQL. Prometheus is frequently used to monitor cloud-native applications, microservices, and other modern infrastructure. Some of the main features of Prometheus include: 1. Data model: Prometheus uses a flexible data model that allows users to organize and label their time-series data in a way that makes sense for their particular use case. Labels are used to identify different dimensions of the data, such as the source of the data or the environment in which it was collected. 2. Pull-based architecture: Prometheus uses a pull-based model to collect data from targets, meaning that the Prometheus server actively queries its targets for metrics data at regular intervals. This architecture is more scalable and reliable than a push-based model, which would require every target to push data to the server. 3. Time-series database: Prometheus stores all of its data in a time-series database, which allows users to perform queries over time ranges and to aggregate and analyze their data in various ways. The database is optimized for write-heavy workloads, and can handle a high volume of data with low latency. 4. Alerting: Prometheus includes a powerful alerting system that allows users to define rules based on their metrics data and to send alerts when certain conditions are met. Alerts can be sent via email, chat, or other channels, and can be customized to include specific details about the problem. 5. Visualization: Prometheus has a built-in graphing and visualization tool, called PromDash, which allows users to create custom dashboards to monitor their systems and applications. PromDash supports a variety of graph types and visualization options, and can be customized using CSS and JavaScript. Overall, Prometheus is a powerful and flexible tool for monitoring and analyzing systems and applications, and is widely used in the industry for cloud-native monitoring and observability.
In what scenarios it might be better to NOT use Prometheus?
From Prometheus documentation: "if you need 100% accuracy, such as for per-request billing".
Describe Prometheus architecture and components
The Prometheus architecture consists of four major components: 1. Prometheus Server: The Prometheus server is responsible for collecting and storing metrics data. It has a simple built-in storage layer that allows it to store time-series data in a time-ordered database. 2. Client Libraries: Prometheus provides a range of client libraries that enable applications to expose their metrics data in a format that can be ingested by the Prometheus server. These libraries are available for a range of programming languages, including Java, Python, and Go. 3. Exporters: Exporters are software components that expose existing metrics from third-party systems and make them available for ingestion by the Prometheus server. Prometheus provides exporters for a range of popular technologies, including MySQL, PostgreSQL, and Apache. 4. Alertmanager: The Alertmanager component is responsible for processing alerts generated by the Prometheus server. It can handle alerts from multiple sources and provides a range of features for deduplicating, grouping, and routing alerts to appropriate channels. Overall, the Prometheus architecture is designed to be highly scalable and resilient. The server and client libraries can be deployed in a distributed fashion to support monitoring across large-scale, highly dynamic environments
Can you compare Prometheus to other solutions like InfluxDB for example?
Compared to other monitoring solutions, such as InfluxDB, Prometheus is known for its high performance and scalability. It can handle large volumes of data and can easily be integrated with other tools in the monitoring ecosystem. InfluxDB, on the other hand, is known for its ease of use and simplicity. It has a user-friendly interface and provides easy-to-use APIs for collecting and querying data. Another popular solution, Nagios, is a more traditional monitoring system that relies on a push-based model for collecting data. Nagios has been around for a long time and is known for its stability and reliability. However, compared to Prometheus, Nagios lacks some of the more advanced features, such as multi-dimensional data model and powerful query language. Overall, the choice of a monitoring solution depends on the specific needs and requirements of the organization. While Prometheus is a great choice for large-scale monitoring and alerting, InfluxDB may be a better fit for smaller environments that require ease of use and simplicity. Nagios remains a solid choice for organizations that prioritize stability and reliability over advanced features.
What is an Alert?
In Prometheus, an alert is a notification triggered when a specific condition or threshold is met. Alerts can be configured to trigger when certain metrics cross a certain threshold or when specific events occur. Once an alert is triggered, it can be routed to various channels, such as email, pager, or chat, to notify relevant teams or individuals to take appropriate action. Alerts are a critical component of any monitoring system, as they allow teams to proactively detect and respond to issues before they impact users or cause system downtime.
What is an Instance? What is a Job?
In Prometheus, an instance refers to a single target that is being monitored. For example, a single server or service. A job is a set of instances that perform the same function, such as a set of web servers serving the same application. Jobs allow you to define and manage a group of targets together. In essence, an instance is an individual target that Prometheus collects metrics from, while a job is a collection of similar instances that can be managed as a group.
What core metrics types Prometheus supports?
Prometheus supports several types of metrics, including: 1. Counter: A monotonically increasing value used for tracking counts of events or samples. Examples include the number of requests processed or the total number of errors encountered. 2. Gauge: A value that can go up or down, such as CPU usage or memory usage. Unlike counters, gauge values can be arbitrary, meaning they can go up and down based on changes in the system being monitored. 3. Histogram: A set of observations or events that are divided into buckets based on their value. Histograms help in analyzing the distribution of a metric, such as request latencies or response sizes. 4. Summary: A summary is similar to a histogram, but instead of buckets, it provides a set of quantiles for the observed values. Summaries are useful for monitoring the distribution of request latencies or response sizes over time. Prometheus also supports various functions and operators for aggregating and manipulating metrics, such as sum, max, min, and rate. These features make it a powerful tool for monitoring and alerting on system metrics.
What is an exporter? What is it used for?
The exporter serves as a bridge between the third-party system or application and Prometheus, making it possible for Prometheus to monitor and collect data from that system or application. The exporter acts as a server, listening on a specific network port for requests from Prometheus to scrape metrics. It collects metrics from the third-party system or application and transforms them into a format that can be understood by Prometheus. The exporter then exposes these metrics to Prometheus via an HTTP endpoint, making them available for collection and analysis. Exporters are commonly used to monitor various types of infrastructure components such as databases, web servers, and storage systems. For example, there are exporters available for monitoring popular databases such as MySQL and PostgreSQL, as well as web servers like Apache and Nginx. Overall, exporters are a critical component of the Prometheus ecosystem, allowing for the monitoring of a wide range of systems and applications, and providing a high degree of flexibility and extensibility to the platform.
Which Prometheus best practices?
Here are three of them: 1. Label carefully: Careful and consistent labeling of metrics is crucial for effective querying and alerting. Labels should be clear, concise, and include all relevant information about the metric. 2. Keep metrics simple: The metrics exposed by exporters should be simple and focus on a single aspect of the system being monitored. This helps avoid confusion and ensures that the metrics are easily understandable by all members of the team. 3. Use alerting sparingly: While alerting is a powerful feature of Prometheus, it should be used sparingly and only for the most critical issues. Setting up too many alerts can lead to alert fatigue and result in important alerts being ignored. It is recommended to set up only the most important alerts and adjust the thresholds over time based on the actual frequency of alerts.
How to get total requests in a given period of time?
To get the total requests in a given period of time using Prometheus, you can use the *sum* function along with the *rate* function. Here is an example query that will give you the total number of requests in the last hour: ``` sum(rate(http_requests_total[1h])) ``` In this query, *http_requests_total* is the name of the metric that tracks the total number of HTTP requests, and the *rate* function calculates the per-second rate of requests over the last hour. The *sum* function then adds up all of the requests to give you the total number of requests in the last hour. You can adjust the time range by changing the duration in the *rate* function. For example, if you wanted to get the total number of requests in the last day, you could change the function to *rate(http_requests_total[1d])*.
What HA in Prometheus means?
HA stands for High Availability. This means that the system is designed to be highly reliable and always available, even in the face of failures or other issues. In practice, this typically involves setting up multiple instances of Prometheus and ensuring that they are all synchronized and able to work together seamlessly. This can be achieved through a variety of techniques, such as load balancing, replication, and failover mechanisms. By implementing HA in Prometheus, users can ensure that their monitoring data is always available and up-to-date, even in the face of hardware or software failures, network issues, or other problems that might otherwise cause downtime or data loss.
How do you join two metrics?
In Prometheus, joining two metrics can be achieved using the *join()* function. The *join()* function combines two or more time series based on their label values. It takes two mandatory arguments: *on* and *table*. The on argument specifies the labels to join *on* and the *table* argument specifies the time series to join. Here's an example of how to join two metrics using the *join()* function: ``` sum_series( join( on(service, instance) request_count_total, on(service, instance) error_count_total, ) ) ``` In this example, the *join()* function combines the *request_count_total* and *error_count_total* time series based on their *service* and *instance* label values. The *sum_series()* function then calculates the sum of the resulting time series
How to write a query that returns the value of a label?
To write a query that returns the value of a label in Prometheus, you can use the *label_values* function. The *label_values* function takes two arguments: the name of the label and the name of the metric. For example, if you have a metric called *http_requests_total* with a label called *method*, and you want to return all the values of the *method* label, you can use the following query: ``` label_values(http_requests_total, method) ``` This will return a list of all the values for the *method* label in the *http_requests_total* metric. You can then use this list in further queries or to filter your data.
How do you convert cpu_user_seconds to cpu usage in percentage?
To convert *cpu_user_seconds* to CPU usage in percentage, you need to divide it by the total elapsed time and the number of CPU cores, and then multiply by 100. The formula is as follows: ``` 100 * sum(rate(process_cpu_user_seconds_total{job=""}[])) by (instance) / ( * ) ``` Here, ** is the name of the job you want to query, ** is the time range you want to query (e.g. *5m*, *1h*), and ** is the number of CPU cores on the machine you are querying. For example, to get the CPU usage in percentage for the last 5 minutes for a job named *my-job* running on a machine with 4 CPU cores, you can use the following query: ``` 100 * sum(rate(process_cpu_user_seconds_total{job="my-job"}[5m])) by (instance) / (5m * 4) ```
## Go
What are some characteristics of the Go programming language?
* Strong and static typing - the type of the variables can't be changed over time and they have to be defined at compile time * Simplicity * Fast compile times * Built-in concurrency * Garbage collected * Platform independent * Compile to standalone binary - anything you need to run your app will be compiled into one binary. Very useful for version management in run-time. Go also has good community.
What is the difference between var x int = 2 and x := 2?
The result is the same, a variable with the value 2. With var x int = 2 we are setting the variable type to integer while with x := 2 we are letting Go figure out by itself the type.
True or False? In Go we can redeclare variables and once declared we must use it. False. We can't redeclare variables but yes, we must use declared variables.
What libraries of Go have you used?
This should be answered based on your usage but some examples are: * fmt - formatted I/O
What is the problem with the following block of code? How to fix it? ``` func main() { var x float32 = 13.5 var y int y = x } ```
The following block of code tries to convert the integer 101 to a string but instead we get "e". Why is that? How to fix it? ```go package main import "fmt" func main() { var x int = 101 var y string y = string(x) fmt.Println(y) } ```
It looks what unicode value is set at 101 and uses it for converting the integer to a string. If you want to get "101" you should use the package "strconv" and replace y = string(x) with y = strconv.Itoa(x)
What is wrong with the following code?: ``` package main func main() { var x = 2 var y = 3 const someConst = x + y } ```
Constants in Go can only be declared using constant expressions. But `x`, `y` and their sum is variable.
const initializer x + y is not a constant
What will be the output of the following block of code?: ```go package main import "fmt" const ( x = iota y = iota ) const z = iota func main() { fmt.Printf("%v\n", x) fmt.Printf("%v\n", y) fmt.Printf("%v\n", z) } ```
Go's iota identifier is used in const declarations to simplify definitions of incrementing numbers. Because it can be used in expressions, it provides a generality beyond that of simple enumerations.
`x` and `y` in the first iota group, `z` in the second.
[Iota page in Go Wiki](https://github.com/golang/go/wiki/Iota)
What _ is used for in Go?
It avoids having to declare all the variables for the returns values. It is called the [blank identifier](https://golang.org/doc/effective_go.html#blank).
[answer in SO](https://stackoverflow.com/questions/27764421/what-is-underscore-comma-in-a-go-declaration#answer-27764432)
What will be the output of the following block of code?: ```go package main import "fmt" const ( _ = iota + 3 x ) func main() { fmt.Printf("%v\n", x) } ```
Since the first iota is declared with the value `3` (` + 3`), the next one has the value `4`
What will be the output of the following block of code?: ```go package main import ( "fmt" "sync" "time" ) func main() { var wg sync.WaitGroup wg.Add(1) go func() { time.Sleep(time.Second * 2) fmt.Println("1") wg.Done() }() go func() { fmt.Println("2") }() wg.Wait() fmt.Println("3") } ```
Output: 2 1 3 [Aritcle about sync/waitgroup](https://tutorialedge.net/golang/go-waitgroup-tutorial/) [Golang package sync](https://golang.org/pkg/sync/)
What will be the output of the following block of code?: ```go package main import ( "fmt" ) func mod1(a []int) { for i := range a { a[i] = 5 } fmt.Println("1:", a) } func mod2(a []int) { a = append(a, 125) // ! for i := range a { a[i] = 5 } fmt.Println("2:", a) } func main() { s1 := []int{1, 2, 3, 4} mod1(s1) fmt.Println("1:", s1) s2 := []int{1, 2, 3, 4} mod2(s2) fmt.Println("2:", s2) } ```
Output:
1 [5 5 5 5]
1 [5 5 5 5]
2 [5 5 5 5 5]
2 [1 2 3 4]
In `mod1` a is link, and when we're using `a[i]`, we're changing `s1` value to. But in `mod2`, `append` creates new slice, and we're changing only `a` value, not `s2`. [Aritcle about arrays](https://golangbot.com/arrays-and-slices/), [Blog post about `append`](https://blog.golang.org/slices)
What will be the output of the following block of code?: ```go package main import ( "container/heap" "fmt" ) // An IntHeap is a min-heap of ints. type IntHeap []int func (h IntHeap) Len() int { return len(h) } func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] } func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] } func (h *IntHeap) Push(x interface{}) { // Push and Pop use pointer receivers because they modify the slice's length, // not just its contents. *h = append(*h, x.(int)) } func (h *IntHeap) Pop() interface{} { old := *h n := len(old) x := old[n-1] *h = old[0 : n-1] return x } func main() { h := &IntHeap{4, 8, 3, 6} heap.Init(h) heap.Push(h, 7) fmt.Println((*h)[0]) } ```
Output: 3 [Golang container/heap package](https://golang.org/pkg/container/heap/)
## Mongo
What are the advantages of MongoDB? Or in other words, why choosing MongoDB and not other implementation of NoSQL?
MongoDB advantages are as following: - Schemaless - Easy to scale-out - No complex joins - Structure of a single object is clear
What is the difference between SQL and NoSQL?
The main difference is that SQL databases are structured (data is stored in the form of tables with rows and columns - like an excel spreadsheet table) while NoSQL is unstructured, and the data storage can vary depending on how the NoSQL DB is set up, such as key-value pair, document-oriented, etc.
In what scenarios would you prefer to use NoSQL/Mongo over SQL?
* Heterogeneous data which changes often * Data consistency and integrity is not top priority * Best if the database needs to scale rapidly
What is a document? What is a collection?
* A document is a record in MongoDB, which is stored in BSON (Binary JSON) format and is the basic unit of data in MongoDB. * A collection is a group of related documents stored in a single database in MongoDB.
What is an aggregator?
* An aggregator is a framework in MongoDB that performs operations on a set of data to return a single computed result.
What is better? Embedded documents or referenced?
* There is no definitive answer to which is better, it depends on the specific use case and requirements. Some explanations : Embedded documents provide atomic updates, while referenced documents allow for better normalization.
Have you performed data retrieval optimizations in Mongo? If not, can you think about ways to optimize a slow data retrieval?
* Some ways to optimize data retrieval in MongoDB are: indexing, proper schema design, query optimization and database load balancing.
##### Queries
Explain this query: db.books.find({"name": /abc/})
Explain this query: db.books.find().sort({x:1})
What is the difference between find() and find_one()?
* `find()` returns all documents that match the query conditions. * find_one() returns only one document that matches the query conditions (or null if no match is found).
How can you export data from Mongo DB?
* mongoexport * programming languages
## SQL ### SQL Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Functions vs. Comparisons | Query Improvements | [Exercise](topics/sql/improve_query.md) | [Solution](topics/sql/solutions/improve_query.md) ### SQL Self Assessment
What is SQL?
SQL (Structured Query Language) is a standard language for relational databases (like MySQL, MariaDB, ...).
It's used for reading, updating, removing and creating data in a relational database.
How is SQL Different from NoSQL
The main difference is that SQL databases are structured (data is stored in the form of tables with rows and columns - like an excel spreadsheet table) while NoSQL is unstructured, and the data storage can vary depending on how the NoSQL DB is set up, such as key-value pair, document-oriented, etc.
When is it best to use SQL? NoSQL?
SQL - Best used when data integrity is crucial. SQL is typically implemented with many businesses and areas within the finance field due to it's ACID compliance. NoSQL - Great if you need to scale things quickly. NoSQL was designed with web applications in mind, so it works great if you need to quickly spread the same information around to multiple servers Additionally, since NoSQL does not adhere to the strict table with columns and rows structure that Relational Databases require, you can store different data types together.
##### Practical SQL - Basics For these questions, we will be using the Customers and Orders tables shown below: **Customers** Customer_ID | Customer_Name | Items_in_cart | Cash_spent_to_Date ------------ | ------------- | ------------- | ------------- 100204 | John Smith | 0 | 20.00 100205 | Jane Smith | 3 | 40.00 100206 | Bobby Frank | 1 | 100.20 **ORDERS** Customer_ID | Order_ID | Item | Price | Date_sold ------------ | ------------- | ------------- | ------------- | ------------- 100206 | A123 | Rubber Ducky | 2.20 | 2019-09-18 100206 | A123 | Bubble Bath | 8.00 | 2019-09-18 100206 | Q987 | 80-Pack TP | 90.00 | 2019-09-20 100205 | Z001 | Cat Food - Tuna Fish | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Chicken | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Beef | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Kitty quesadilla | 10.00 | 2019-08-05 100204 | X202 | Coffee | 20.00 | 2019-04-29
How would I select all fields from this table?
Select *
From Customers;
How many items are in John's cart?
Select Items_in_cart
From Customers
Where Customer_Name = "John Smith";
What is the sum of all the cash spent across all customers?
Select SUM(Cash_spent_to_Date) as SUM_CASH
From Customers;
How many people have items in their cart?
Select count(1) as Number_of_People_w_items
From Customers
where Items_in_cart > 0;
How would you join the customer table to the order table?
You would join them on the unique key. In this case, the unique key is Customer_ID in both the Customers table and Orders table
How would you show which customer ordered which items?
Select c.Customer_Name, o.Item
From Customers c
Left Join Orders o
On c.Customer_ID = o.Customer_ID;
Using a with statement, how would you show who ordered cat food, and the total amount of money spent?
with cat_food as (
Select Customer_ID, SUM(Price) as TOTAL_PRICE
From Orders
Where Item like "%Cat Food%"
Group by Customer_ID
)
Select Customer_name, TOTAL_PRICE
From Customers c
Inner JOIN cat_food f
ON c.Customer_ID = f.Customer_ID
where c.Customer_ID in (Select Customer_ID from cat_food); Although this was a simple statement, the "with" clause really shines when a complex query needs to be run on a table before joining to another. With statements are nice, because you create a pseudo temp when running your query, instead of creating a whole new table. The Sum of all the purchases of cat food weren't readily available, so we used a with statement to create the pseudo table to retrieve the sum of the prices spent by each customer, then join the table normally.
Which of the following queries would you use? ``` SELECT count(*) SELECT count(*) FROM shawarma_purchases FROM shawarma_purchases WHERE vs. WHERE YEAR(purchased_at) == '2017' purchased_at >= '2017-01-01' AND purchased_at <= '2017-31-12' ```
``` SELECT count(*) FROM shawarma_purchases WHERE purchased_at >= '2017-01-01' AND purchased_at <= '2017-31-12' ``` When you use a function (`YEAR(purchased_at)`) it has to scan the whole database as opposed to using indexes and basically the column as it is, in its natural state.
## OpenStack
What components/projects of OpenStack are you familiar with?
I’m most familiar with several core OpenStack components: - Nova for compute resource provisioning, including VM lifecycle management. - Neutron for networking, focusing on creating and managing networks, subnets, and routers. - Cinder for block storage, used to attach and manage storage volumes. - Keystone for identity services, handling authentication and authorization. I’ve implemented these in past projects, configuring them for scalability and security to support multi-tenant environments.
Can you tell me what each of the following services/projects is responsible for?: - Nova - Neutron - Cinder - Glance - Keystone
* Nova - Manage virtual instances * Neutron - Manage networking by providing Network as a service (NaaS) * Cinder - Block Storage * Glance - Manage images for virtual machines and containers (search, get and register) * Keystone - Authentication service across the cloud
Identify the service/project used for each of the following: * Copy or snapshot instances * GUI for viewing and modifying resources * Block Storage * Manage virtual instances
* Glance - Images Service. Also used for copying or snapshot instances * Horizon - GUI for viewing and modifying resources * Cinder - Block Storage * Nova - Manage virtual instances
What is a tenant/project?
In OpenStack, a project (formerly known as a tenant) is a fundamental unit of ownership and isolation for resources like virtual machines, storage volumes, and networks. Each project is owned by a specific user or group of users and provides a way to manage and segregate resources within a shared cloud environment. This ensures that one project's resources are not accessible to another unless explicitly shared.
Determine true or false: * OpenStack is free to use * The service responsible for networking is Glance * The purpose of tenant/project is to share resources between different projects and users of OpenStack
* OpenStack is free to use - **True**. OpenStack is open-source software released under the Apache 2.0 license. * The service responsible for networking is Glance - **False**. Neutron is the service responsible for networking. Glance is the image service. * The purpose of tenant/project is to share resources between different projects and users of OpenStack - **False**. The primary purpose is to isolate resources.
Describe in detail how you bring up an instance with a floating IP
To launch an instance with a floating IP, you would follow these steps: 1. **Create a Network and Subnet:** First, ensure you have a private network and subnet for your instances. 2. **Create a Router:** Create a router and connect it to the public (external) network and your private subnet. 3. **Launch an Instance:** Launch a new instance, attaching it to your private network. It will receive a private IP address from the subnet. 4. **Allocate a Floating IP:** Allocate a new floating IP address from the public network pool to your project. 5. **Associate the Floating IP:** Associate the allocated floating IP with the private IP address of your instance. This allows the instance to be accessible from the internet.
You get a call from a customer saying: "I can ping my instance but can't connect (ssh) it". What might be the problem?
If you can ping an instance but cannot SSH into it, the issue is likely related to one of the following: * **Security Group Rules:** The security group attached to the instance may not have a rule allowing inbound traffic on TCP port 22 (the default SSH port). * **Firewall on the Instance:** A firewall running on the instance itself (like `iptables` or `firewalld`) might be blocking the SSH port. * **SSH Service:** The SSH daemon (`sshd`) on the instance might not be running or could be misconfigured. * **Incorrect SSH Key:** You might be using the wrong private key to connect to the instance.
What types of networks OpenStack supports?
OpenStack Neutron supports several network types: * **Local:** A local network is isolated to a single compute node and cannot be shared between multiple nodes. * **Flat:** A flat network is a simple, non-VLAN-tagged network that is shared across all compute nodes. * **VLAN:** A VLAN network uses 802.1q tagging to create isolated layer-2 broadcast domains. * **VXLAN:** VXLAN (Virtual Extensible LAN) is an overlay network technology that encapsulates layer-2 frames in UDP packets, allowing for a large number of isolated networks. * **GRE:** GRE (Generic Routing Encapsulation) is another overlay network technology that can be used to create private networks over a public network.
How do you debug OpenStack storage issues? (tools, logs, ...)
To debug storage issues in OpenStack (Cinder), you can use the following: * **Logs:** Check the Cinder service logs (e.g., `/var/log/cinder/cinder-volume.log`, `/var/log/cinder/cinder-api.log`) for error messages. * **Cinder CLI:** Use the `cinder` command-line tool to check the status of volumes, snapshots, and storage backends. * **Database:** Inspect the Cinder database to check for inconsistencies in volume states or metadata. * **Backend Storage:** Check the logs and status of the underlying storage system (e.g., LVM, Ceph, NFS) to identify issues with the storage itself.
How do you debug OpenStack compute issues? (tools, logs, ...)
To debug compute issues in OpenStack (Nova), you can use the following: * **Logs:** Check the Nova service logs (e.g., `/var/log/nova/nova-compute.log`, `/var/log/nova/nova-api.log`, `/var/log/nova/nova-scheduler.log`) for error messages. * **Nova CLI:** Use the `nova` command-line tool to check the status of instances, hosts, and services. * **Instance Console Log:** View the console log of a specific instance to see boot-up messages and other output. * **Hypervisor:** Check the logs and status of the underlying hypervisor (e.g., KVM, QEMU) to identify issues with virtualization.
#### OpenStack Deployment & TripleO
Have you deployed OpenStack in the past? If yes, can you describe how you did it?
There are several ways to deploy OpenStack, depending on the scale and complexity of the environment. Some common methods include: * **DevStack:** A script-based installer designed for development and testing purposes. It deploys OpenStack from the latest source code. * **Packstack:** A utility that uses Puppet modules to deploy OpenStack on CentOS or RHEL. It is suitable for proof-of-concept and small-scale production environments. * **Kolla-Ansible:** A set of Ansible playbooks that deploy OpenStack services as Docker containers. This method is highly scalable and recommended for production deployments. * **OpenStack-Ansible:** A collection of Ansible playbooks that deploy OpenStack services directly on bare metal or virtual machines.
Are you familiar with TripleO? How is it different from Devstack or Packstack?
You can read about TripleO right [here](https://docs.openstack.org/tripleo-docs/latest)
#### OpenStack Compute
Can you describe Nova in detail?
* Used to provision and manage virtual instances * It supports Multi-Tenancy in different levels - logging, end-user control, auditing, etc. * Highly scalable * Authentication can be done using internal system or LDAP * Supports multiple types of block storage * Tries to be hardware and hypervisor agnostice
What do you know about Nova architecture and components?
* nova-api - the server which serves metadata and compute APIs * the different Nova components communicate by using a queue (Rabbitmq usually) and a database * a request for creating an instance is inspected by nova-scheduler which determines where the instance will be created and running * nova-compute is the component responsible for communicating with the hypervisor for creating the instance and manage its lifecycle
#### OpenStack Networking (Neutron)
Explain Neutron in detail
* One of the core component of OpenStack and a standalone project * Neutron focused on delivering networking as a service * With Neutron, users can set up networks in the cloud and configure and manage a variety of network services * Neutron interacts with: * Keystone - authorize API calls * Nova - nova communicates with neutron to plug NICs into a network * Horizon - supports networking entities in the dashboard and also provides topology view which includes networking details
Explain each of the following components: - neutron-dhcp-agent - neutron-l3-agent - neutron-metering-agent - neutron-*-agtent - neutron-server
* neutron-l3-agent - L3/NAT forwarding (provides external network access for VMs for example) * neutron-dhcp-agent - DHCP services * neutron-metering-agent - L3 traffic metering * neutron-*-agtent - manages local vSwitch configuration on each compute (based on chosen plugin) * neutron-server - exposes networking API and passes requests to other plugins if required
Explain these network types: - Management Network - Guest Network - API Network - External Network
* Management Network - used for internal communication between OpenStack components. Any IP address in this network is accessible only within the datacetner * Guest Network - used for communication between instances/VMs * API Network - used for services API communication. Any IP address in this network is publicly accessible * External Network - used for public communication. Any IP address in this network is accessible by anyone on the internet
In which order should you remove the following entities: * Network * Port * Router * Subnet
- Port - Subnet - Router - Network There are many reasons for that. One for example: you can't remove router if there are active ports assigned to it.
What is a provider network?
A provider network is a network that is created by an OpenStack administrator and maps directly to an existing physical network in the data center. It allows for direct layer-2 connectivity to instances and is typically used for providing external network access or for connecting to specific physical networks.
What components and services exist for L2 and L3?
* **L2 (Layer 2):** The primary L2 component is the `neutron-openvswitch-agent` (or a similar agent for other plugins), which runs on each compute node and manages the local virtual switch (e.g., Open vSwitch). It is responsible for connecting instances to virtual networks and enforcing security group rules. * **L3 (Layer 3):** The `neutron-l3-agent` is responsible for providing L3 services like routing and floating IPs. It manages virtual routers that connect private networks to external networks.
What is the ML2 plug-in? Explain its architecture
ML2 (Modular Layer 2) is a framework that allows OpenStack to simultaneously utilize a variety of layer-2 networking technologies. It replaces the monolithic plugins for individual network types and provides a more flexible and extensible architecture. ML2 uses a combination of `Type` drivers (for network types like VLAN, VXLAN, etc.) and `Mechanism` drivers (for connecting to different network mechanisms like Open vSwitch, Linux Bridge, etc.).
What is the L2 agent? How does it works and what is it responsible for?
The L2 agent is a service that runs on each compute node and is responsible for wiring virtual networks to instances. It communicates with the Neutron server to get the network topology and then configures the local virtual switch (e.g., Open vSwitch) to connect instances to the correct networks. It also enforces security group rules by configuring the virtual switch.
What is the L3 agent? How does it works and what is it responsible for?
The L3 agent is responsible for providing layer-3 networking services, such as routing and floating IPs. It runs on network nodes and manages virtual routers that connect private networks to external networks. The L3 agent creates network namespaces for each router to provide isolation and then configures routing rules and NAT to enable traffic to flow between networks.
Explain what the Metadata agent is responsible for
The Metadata agent is responsible for providing metadata (e.g., instance ID, hostname, public keys) to instances. It runs on network nodes and acts as a proxy between instances and the Nova metadata service. When an instance requests metadata, the request is forwarded to the Metadata agent, which then retrieves the information from Nova and returns it to the instance.
What networking entities Neutron supports?
Neutron supports a variety of networking entities, including: * **Network:** An isolated layer-2 broadcast domain. * **Subnet:** A block of IP addresses that can be assigned to instances. * **Port:** A connection point for attaching a single device, such as an instance, to a virtual network. * **Router:** A logical entity that connects multiple layer-2 networks. * **Floating IP:** A public IP address that can be associated with an instance to provide external connectivity. * **Security Group:** A collection of firewall rules that control inbound and outbound traffic to instances.
How do you debug OpenStack networking issues? (tools, logs, ...)
To debug networking issues in OpenStack (Neutron), you can use the following: * **Logs:** Check the Neutron service logs (e.g., `/var/log/neutron/neutron-server.log`, `/var/log/neutron/openvswitch-agent.log`, `/var/log/neutron/l3-agent.log`) for error messages. * **Neutron CLI:** Use the `neutron` command-line tool to check the status of networks, subnets, ports, routers, and other networking entities. * **`ip netns`:** Use the `ip netns` command to inspect network namespaces and the network configurations within them. * **`ovs-vsctl` and `ovs-ofctl`:** Use these tools to inspect the configuration and flow tables of Open vSwitch bridges. * **`tcpdump`:** Use `tcpdump` to capture and analyze network traffic on various interfaces to identify connectivity issues.
#### OpenStack - Glance
Explain Glance in detail
* Glance is the OpenStack image service * It handles requests related to instances disks and images * Glance also used for creating snapshots for quick instances backups * Users can use Glance to create new images or upload existing ones
Describe Glance architecture
* glance-api - responsible for handling image API calls such as retrieval and storage. It consists of two APIs: 1. registry-api - responsible for internal requests 2. user API - can be accessed publicly * glance-registry - responsible for handling image metadata requests (e.g. size, type, etc). This component is private which means it's not available publicly * metadata definition service - API for custom metadata * database - for storing images metadata * image repository - for storing images. This can be a filesystem, swift object storage, HTTP, etc.
#### OpenStack - Swift
Explain Swift in detail
* Swift is Object Store service and is an highly available, distributed and consistent store designed for storing a lot of data * Swift is distributing data across multiple servers while writing it to multiple disks * One can choose to add additional servers to scale the cluster. All while swift maintaining integrity of the information and data replications.
Can users store by default an object of 100GB in size?
Not by default. Object Storage API limits the maximum to 5GB per object but it can be adjusted.
Explain the following in regards to Swift: * Container * Account * Object
- Container - Defines a namespace for objects. - Account - Defines a namespace for containers - Object - Data content (e.g. image, document, ...)
True or False? there can be two objects with the same name in the same container but not in two different containers
False. Two objects can have the same name if they are in different containers.
#### OpenStack - Cinder
Explain Cinder in detail
* Cinder is OpenStack Block Storage service * It basically provides used with storage resources they can consume with other services such as Nova * One of the most used implementations of storage supported by Cinder is LVM * From user perspective this is transparent which means the user doesn't know where, behind the scenes, the storage is located or what type of storage is used
Describe Cinder's components
* cinder-api - receives API requests * cinder-volume - manages attached block devices * cinder-scheduler - responsible for storing volumes
#### OpenStack - Keystone
Can you describe the following concepts in regards to Keystone? - Role - Tenant/Project - Service - Endpoint - Token
- Role - A list of rights and privileges determining what a user or a project can perform - Tenant/Project - Logical representation of a group of resources isolated from other groups of resources. It can be an account, organization, ... - Service - An endpoint which the user can use for accessing different resources - Endpoint - a network address which can be used to access a certain OpenStack service - Token - Used for access resources while describing which resources can be accessed by using a scope
What are the properties of a service? In other words, how a service is identified?
Using: - Name - ID number - Type - Description
Explain the following: - PublicURL - InternalURL - AdminURL
- PublicURL - Publicly accessible through public internet - InternalURL - Used for communication between services - AdminURL - Used for administrative management
What is a service catalog?
A list of services and their endpoints
#### OpenStack Advanced - Services
Describe each of the following services * Swift * Sahara * Ironic * Trove * Aodh * Ceilometer
* Swift - highly available, distributed, eventually consistent object/blob store * Sahara - Manage Hadoop Clusters * Ironic - Bare Metal Provisioning * Trove - Database as a service that runs on OpenStack * Aodh - Alarms Service * Ceilometer - Track and monitor usage
Identify the service/project used for each of the following: * Database as a service which runs on OpenStack * Bare Metal Provisioning * Track and monitor usage * Alarms Service * Manage Hadoop Clusters * highly available, distributed, eventually consistent object/blob store
* Database as a service which runs on OpenStack - Trove * Bare Metal Provisioning - Ironic * Track and monitor usage - Ceilometer * Alarms Service - Aodh * Manage Hadoop Clusters * Manage Hadoop Clusters - Sahara * highly available, distributed, eventually consistent object/blob store - Swift
#### OpenStack Advanced - Keystone
Can you describe Keystone service in detail?
* You can't have OpenStack deployed without Keystone * It Provides identity, policy and token services * The authentication provided is for both users and services * The authorization supported is token-based and user-based. * There is a policy defined based on RBAC stored in a JSON file and each line in that file defines the level of access to apply
Describe Keystone architecture
* There is a service API and admin API through which Keystone gets requests * Keystone has four backends: * Token Backend - Temporary Tokens for users and services * Policy Backend - Rules management and authorization * Identity Backend - users and groups (either standalone DB, LDAP, ...) * Catalog Backend - Endpoints * It has pluggable environment where you can integrate with: * LDAP * KVS (Key Value Store) * SQL * PAM * Memcached
Describe the Keystone authentication process
* Keystone gets a call/request and checks whether it's from an authorized user, using username, password and authURL * Once confirmed, Keystone provides a token. * A token contains a list of user's projects so there is no to authenticate every time and a token can submitted instead
#### OpenStack Advanced - Compute (Nova)
What each of the following does?: * nova-api * nova-compuate * nova-conductor * nova-cert * nova-consoleauth * nova-scheduler
* nova-api - responsible for managing requests/calls * nova-compute - responsible for managing instance lifecycle * nova-conductor - Mediates between nova-compute and the database so nova-compute doesn't access it directly * nova-cert - Manages X509 certificates for secure communication. * nova-consoleauth - Authorizes tokens for users to access instance consoles. * nova-scheduler - Determines which compute host an instance should be launched on based on a set of filters and weights.
What types of Nova proxies are you familiar with?
* Nova-novncproxy - Access through VNC connections * Nova-spicehtml5proxy - Access through SPICE * Nova-xvpvncproxy - Access through a VNC connection
#### OpenStack Advanced - Networking (Neutron)
Explain BGP dynamic routing
BGP (Border Gateway Protocol) is a standardized exterior gateway protocol used to exchange routing and reachability information among autonomous systems on the internet. In OpenStack, BGP can be used to dynamically advertise floating IP addresses and project networks to physical routers, eliminating the need for static routes and enabling more scalable and resilient network architectures.
What is the role of network namespaces in OpenStack?
Network namespaces are a Linux kernel feature that provides isolated network stacks for different processes. In OpenStack, network namespaces are used to isolate the network resources of different virtual routers and other networking services. This ensures that each router has its own set of interfaces, routing tables, and firewall rules, preventing conflicts and providing a secure multi-tenant environment.
#### OpenStack Advanced - Horizon
Can you describe Horizon in detail?
* Django-based project focusing on providing an OpenStack dashboard and the ability to create additional customized dashboards * You can use it to access the different OpenStack services resources - instances, images, networks, ... * By accessing the dashboard, users can use it to list, create, remove and modify the different resources * It's also highly customizable and you can modify or add to it based on your needs
What can you tell about Horizon architecture?
* API is backward compatible * There are three type of dashboards: user, system and settings * It provides core support for all OpenStack core projects such as Neutron, Nova, etc. (out of the box, no need to install extra packages or plugins) * Anyone can extend the dashboards and add new components * Horizon provides templates and core classes from which one can build its own dashboard
## Puppet
What is Puppet? How does it works?
* Puppet is a configuration management tool ensuring that all systems are configured to a desired and predictable state.
Explain Puppet architecture
* Puppet has a primary-secondary node architecture. The clients are distributed across the network and communicate with the primary-secondary environment where Puppet modules are present. The client agent sends a certificate with its ID to the server; the server then signs that certificate and sends it back to the client. This authentication allows for secure and verifiable communication between the client and the master.
Can you compare Puppet to other configuration management tools? Why did you chose to use Puppet?
* Puppet is often compared to other configuration management tools like Chef, Ansible, SaltStack, and cfengine. The choice to use Puppet often depends on an organization's needs, such as ease of use, scalability, and community support.
Explain the following: * Module * Manifest * Node
* Modules - are a collection of manifests, templates, and files * Manifests - are the actual codes for configuring the clients * Node - allows you to assign specific configurations to specific nodes
Explain Facter
* Facter is a standalone tool in Puppet that collects information about a system and its configuration, such as the operating system, IP addresses, memory, and network interfaces. This information can be used in Puppet manifests to make decisions about how resources should be managed, and to customize the behavior of Puppet based on the characteristics of the system. Facter is integrated into Puppet, and its facts can be used within Puppet manifests to make decisions about resource management.
What is MCollective?
* MCollective is a middleware system that integrates with Puppet to provide orchestration, remote execution, and parallel job execution capabilities.
Do you have experience with writing modules? Which module have you created and for what?
Explain what is Hiera
* Hiera is a hierarchical data store in Puppet that is used to separate data from code, allowing data to be more easily separated, managed, and reused.
## Elastic
What is the Elastic Stack?
The Elastic Stack consists of: * Elasticsearch * Kibana * Logstash * Beats * Elastic Hadoop * APM Server Elasticsearch, Logstash and Kibana are also known as the ELK stack.
Explain what is Elasticsearch
From the official [docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/documents-indices.html): "Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents"
What is Logstash?
From the [blog](https://logit.io/blog/post/the-top-50-elk-stack-and-elasticsearch-interview-questions): "Logstash is a powerful, flexible pipeline that collects, enriches and transports data. It works as an extract, transform & load (ETL) tool for collecting log messages."
Explain what beats are
Beats are lightweight data shippers. These data shippers installed on the client where the data resides. Examples of beats: Filebeat, Metricbeat, Auditbeat. There are much more.
What is Kibana?
From the official docs: "Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualize your data in a variety of charts, tables, and maps."
Describe what happens from the moment an app logged some information until it's displayed to the user in a dashboard when the Elastic stack is used
The process may vary based on the chosen architecture and the processing you may want to apply to the logs. One possible workflow is: 1. The data logged by the application is picked by filebeat and sent to logstash 2. Logstash process the log based on the defined filters. Once done, the output is sent to Elasticsearch 2. Elasticsearch stores the document it got and the document is indexed for quick future access 4. The user creates visualizations in Kibana which based on the indexed data 5. The user creates a dashboard which composed out of the visualization created in the previous step
##### Elasticsearch
What is a data node?
This is where data is stored and also where different processing takes place (e.g. when you search for a data).
What is a master node?
Part of a master node responsibilities: * Track the status of all the nodes in the cluster * Verify replicas are working and the data is available from every data node. * No hot nodes (no data node that works much harder than other nodes) While there can be multiple master nodes in reality only of them is the elected master node.
What is an ingest node?
A node which responsible for processing the data according to ingest pipeline. In case you don't need to use logstash then this node can receive data from beats and process it, similarly to how it can be processed in Logstash.
What is Coordinating only node?
From the official docs: Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes. They join the cluster and receive the full cluster state, like every other node, and they use the cluster state to route requests directly to the appropriate place(s).
How data is stored in Elasticsearch?
* Data is stored in an index * The index is spread across the cluster using shards
What is an Index?
Index in Elasticsearch is in most cases compared to a whole database from the SQL/NoSQL world.
You can choose to have one index to hold all the data of your app or have multiple indices where each index holds different type of your app (e.g. index for each service your app is running). The official docs also offer a great explanation (in general, it's really good documentation, as every project should have): "An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data"
Explain Shards
An index is split into shards and documents are hashed to a particular shard. Each shard may be on a different node in a cluster and each one of the shards is a self contained index.
This allows Elasticsearch to scale to an entire cluster of servers.
What is an Inverted Index?
From the official docs: "An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in."
What is a Document?
Continuing with the comparison to SQL/NoSQL a Document in Elasticsearch is a row in table in the case of SQL or a document in a collection in the case of NoSQL. As in NoSQL a document is a JSON object which holds data on a unit in your app. What is this unit depends on the your app. If your app related to book then each document describes a book. If you are app is about shirts then each document is a shirt.
You check the health of your elasticsearch cluster and it's red. What does it mean? What can cause the status to be yellow instead of green?
Red means some data is unavailable in your cluster. Some shards of your indices are unassigned. There are some other states for the cluster. Yellow means that you have unassigned shards in the cluster. You can be in this state if you have single node and your indices have replicas. Green means that all shards in the cluster are assigned to nodes and your cluster is healthy.
True or False? Elasticsearch indexes all data in every field and each indexed field has the same data structure for unified and quick query ability
False. From the official docs: "Each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees."
What reserved fields a document has?
* _index * _id * _type
Explain Mapping
What are the advantages of defining your own mapping? (or: when would you use your own mapping?)
* You can optimize fields for partial matching * You can define custom formats of known fields (e.g. date) * You can perform language-specific analysis
Explain Replicas
In a network/cloud environment where failures can be expected any time, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
Can you explain Term Frequency & Document Frequency?
Term Frequency is how often a term appears in a given document and Document Frequency is how often a term appears in all documents. They both are used for determining the relevance of a term by calculating Term Frequency / Document Frequency.
You check "Current Phase" under "Index lifecycle management" and you see it's set to "hot". What does it mean?
"The index is actively being written to". More about the phases [here](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/ilm-policy-definition.html)
What this command does? curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'{ "name": "John Doe" }'
It creates customer index if it doesn't exists and adds a new document with the field name which is set to "John Dow". Also, if it's the first document it will get the ID 1.
What will happen if you run the previous command twice? What about running it 100 times?
1. If name value was different then it would update "name" to the new value 2. In any case, it bumps version field by one
What is the Bulk API? What would you use it for?
Bulk API is used when you need to index multiple documents. For high number of documents it would be significantly faster to use rather than individual requests since there are less network roundtrips.
##### Query DSL
Explain Elasticsearch query syntax (Booleans, Fields, Ranges)
Explain what is Relevance Score
Explain Query Context and Filter Context
From the official docs: "In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score meta-field." "In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data"
Describe how would an architecture of production environment with large amounts of data would be different from a small-scale environment
There are several possible answers for this question. One of them is as follows: A small-scale architecture of elastic will consist of the elastic stack as it is. This means we will have beats, logstash, elastcsearch and kibana.
A production environment with large amounts of data can include some kind of buffering component (e.g. Reddis or RabbitMQ) and also security component such as Nginx.
##### Logstash
What are Logstash plugins? What plugins types are there?
* Input Plugins - how to collect data from different sources * Filter Plugins - processing data * Output Plugins - push data to different outputs/services/platforms
What is grok?
A logstash plugin which modifies information in one format and immerse it in another.
How grok works?
What grok patterns are you familiar with?
What is `_grokparsefailure?`
How do you test or debug grok patterns?
What are Logstash Codecs? What codecs are there?
##### Kibana
What can you find under "Discover" in Kibana?
The raw data as it is stored in the index. You can search and filter it.
You see in Kibana, after clicking on Discover, "561 hits". What does it mean?
Total number of documents matching the search results. If not query used then simply the total number of documents.
What can you find under "Visualize"?
"Visualize" is where you can create visual representations for your data (pie charts, graphs, ...)
What visualization types are supported/included in Kibana?
What visualization type would you use for statistical outliers
Describe in detail how do you create a dashboard in Kibana
#### Filebeat
What is Filebeat?
Filebeat is used to monitor the logging directories inside of VMs or mounted as a sidecar if exporting logs from containers, and then forward these logs onward for further processing, usually to logstash.
If one is using ELK, is it a must to also use filebeat? In what scenarios it's useful to use filebeat?
Filebeat is a typical component of the ELK stack, since it was developed by Elastic to work with the other products (Logstash and Kibana). It's possible to send logs directly to logstash, though this often requires coding changes for the application. Particularly for legacy applications with little test coverage, it might be a better option to use filebeat, since you don't need to make any changes to the application code.
What is a harvester?
Read [here](https://www.elastic.co/guide/en/beats/filebeat/current/how-filebeat-works.html#harvester)
True or False? a single harvester harvest multiple files, according to the limits set in filebeat.yml
False. One harvester harvests one file.
What are filebeat modules?
These are pre-configured modules for specific types of logging locations (eg, Traefik, Fargate, HAProxy) to make it easy to configure forwarding logs using filebeat. They have different configurations based on where you're collecting logs from.
#### Elastic Stack
How do you secure an Elastic Stack?
You can generate certificates with the provided elastic utils and change configuration to enable security using certificates model.
## Distributed
Explain Distributed Computing (or Distributed System)
According to Martin Kleppmann: "Many processes running on many machines...only message-passing via an unreliable network with variable delays, and the system may suffer from partial failures, unreliable clocks, and process pauses." Another definition: "Systems that are physically separated, but logically connected"
What can cause a system to fail?
* Network * CPU * Memory * Disk
Do you know what is "CAP theorem"? (aka as Brewer's theorem)
According to the CAP theorem, it's not possible for a distributed data store to provide more than two of the following at the same time: * Availability: Every request receives a response (it doesn't has to be the most recent data) * Consistency: Every request receives a response with the latest/most recent data * Partition tolerance: Even if some the data is lost/dropped, the system keeps running
What are the problems with the following design? How to improve it?

1. The transition can take time. In other words, noticeable downtime. 2. Standby server is a waste of resources - if first application server is running then the standby does nothing
What are the problems with the following design? How to improve it?

Issues: If load balancer dies , we lose the ability to communicate with the application. Ways to improve: * Add another load balancer * Use DNS A record for both load balancers * Use message queue
What is "Shared-Nothing" architecture?
It's an architecture in which data is and retrieved from a single, non-shared, source usually exclusively connected to one node as opposed to architectures where the request can get to one of many nodes and the data will be retrieved from one shared location (storage, memory, ...).
Explain the Sidecar Pattern (Or sidecar proxy)
## Misc |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Highly Available "Hello World" | [Exercise](topics/devops/ha_hello_world.md) | [Solution](topics/devops/solutions/ha_hello_world.md)
What happens when you type in a URL in an address bar in a browser?
1. The browser searches for the record of the domain name IP address in the DNS in the following order: * Browser cache * Operating system cache * The DNS server configured on the user's system (can be ISP DNS, public DNS, ...) 2. If it couldn't find a DNS record locally, a full DNS resolution is started. 3. It connects to the server using the TCP protocol 4. The browser sends an HTTP request to the server 5. The server sends an HTTP response back to the browser 6. The browser renders the response (e.g. HTML) 7. The browser then sends subsequent requests as needed to the server to get the embedded links, javascript, images in the HTML and then steps 3 to 5 are repeated. TODO: add more details!
#### API
Explain what is an API
I like this definition from [blog.christianposta.com](https://blog.christianposta.com/microservices/api-gateways-are-going-through-an-identity-crisis): "An explicitly and purposefully defined interface designed to be invoked over a network that enables software developers to get programmatic access to data and functionality within an organization in a controlled and comfortable way."
What is an API specification?
From [swagger.io](https://swagger.io/resources/articles/difference-between-api-documentation-specification): "An API specification provides a broad understanding of how an API behaves and how the API links with other APIs. It explains how the API functions and the results to expect when using the API"
True or False? API Definition is the same as API Specification
False. From [swagger.io](https://swagger.io/resources/articles/difference-between-api-documentation-specification): "An API definition is similar to an API specification in that it provides an understanding of how an API is organized and how the API functions. But the API definition is aimed at machine consumption instead of human consumption of APIs."
What is an API gateway?
An API gateway is like the gatekeeper that controls how different parts talk to each other and how information is exchanged between them. The API gateway provides a single point of entry for all clients, and it can perform several tasks, including routing requests to the appropriate backend service, load balancing, security and authentication, rate limiting, caching, and monitoring. By using an API gateway, organizations can simplify the management of their APIs, ensure consistent security and governance, and improve the performance and scalability of their backend services. They are also commonly used in microservices architectures, where there are many small, independent services that need to be accessed by different clients.
What are the advantages of using/implementing an API gateway?
Advantages: - Simplifies API management: Provides a single entry point for all requests, which simplifies the management and monitoring of multiple APIs. - Improves security: Able to implement security features like authentication, authorization, and encryption to protect the backend services from unauthorized access. - Enhances scalability: Can handle traffic spikes and distribute requests to backend services in a way that maximizes resource utilization and improves overall system performance. - Enables service composition: Can combine different backend services into a single API, providing more granular control over the services that clients can access. - Facilitates integration with external systems: Can be used to expose internal services to external partners or customers, making it easier to integrate with external systems and enabling new business models.
What is a Payload in API?
What is Automation? How it's related or different from Orchestration?
Automation is the act of automating tasks to reduce human intervention or interaction in regards to IT technology and systems.
While automation focuses on a task level, Orchestration is the process of automating processes and/or workflows which consists of multiple tasks that usually across multiple systems.
Tell me about interesting bugs you've found and also fixed
What is a Debugger and how it works?
What services an application might have?
* Authorization * Logging * Authentication * Ordering * Front-end * Back-end ...
What is Metadata?
Data about data. Basically, it describes the type of information that an underlying data will hold.
You can use one of the following formats: JSON, YAML, XML. Which one would you use? Why?
I can't answer this for you :)
What's KPI?
What's OKR?
What's DSL (Domain Specific Language)?
Domain Specific Language (DSLs) are used to create a customised language that represents the domain such that domain experts can easily interpret it.
What's the difference between KPI and OKR?
#### YAML
What is YAML?
Data serialization language used by many technologies today like Kubernetes, Ansible, etc.
True or False? Any valid JSON file is also a valid YAML file
True. Because YAML is superset of JSON.
What is the format of the following data? ``` { applications: [ { name: "my_app", language: "python", version: 20.17 } ] } ```
JSON
What is the format of the following data? ``` applications: - app: "my_app" language: "python" version: 20.17 ```
YAML
How to write a multi-line string with YAML? What use cases is it good for?
``` someMultiLineString: | look mama I can write a multi-line string I love YAML ``` It's good for use cases like writing a shell script where each line of the script is a different command.
What is the difference between someMultiLineString: | to someMultiLineString: >?
using `>` will make the multi-line string to fold into a single line ``` someMultiLineString: > This is actually a single line do not let appearances fool you ```
What are placeholders in YAML?
They allow you reference values instead of directly writing them and it is used like this: ``` username: {{ my.user_name }} ```
How can you define multiple YAML components in one file?
Using this: `---` For Examples: ``` document_number: 1 --- document_number: 2 ```
#### Firmware
Explain what is a firmware
[Wikipedia](https://en.wikipedia.org/wiki/Firmware): "In computing, firmware is a specific class of computer software that provides the low-level control for a device's specific hardware. Firmware, such as the BIOS of a personal computer, may contain basic functions of a device, and may provide hardware abstraction services to higher-level software such as operating systems."
## Cassandra
When running a cassandra cluster, how often do you need to run nodetool repair in order to keep the cluster consistent? * Within the columnFamily GC-grace Once a week * Less than the compacted partition minimum bytes * Depended on the compaction strategy
## HTTP
What is HTTP?
[Avinetworks](https://avinetworks.com/glossary/layer-7/): HTTP stands for Hypertext Transfer Protocol. HTTP uses TCP port 80 to enable internet communication. It is part of the Application Layer (L7) in OSI Model.
Describe HTTP request lifecycle
* Resolve host by request to DNS resolver * Client SYN * Server SYN+ACK * Client SYN * HTTP request * HTTP response
True or False? HTTP is stateful
False. It doesn't maintain state for incoming request.
How HTTP request looks like?
It consists of: * Request line - request type * Headers - content info like length, encoding, etc. * Body (not always included)
What HTTP method types are there?
* GET * POST * HEAD * PUT * DELETE * CONNECT * OPTIONS * TRACE
What HTTP response codes are there?
* 1xx - informational * 2xx - Success * 3xx - Redirect * 4xx - Error, client fault * 5xx - Error, server fault
What is HTTPS?
HTTPS is a secure version of the HTTP protocol used to transfer data between a web browser and a web server. It encrypts the communication using SSL/TLS encryption to ensure that the data is private and secure. Learn more: https://www.cloudflare.com/learning/ssl/why-is-http-not-secure/
Explain HTTP Cookies
HTTP is stateless. To share state, we can use Cookies. TODO: explain what is actually a Cookie
What is HTTP Pipelining?
You get "504 Gateway Timeout" error from an HTTP server. What does it mean?
The server didn't receive a response from another server it communicates with in a timely manner.
What is a proxy?
A proxy is a server that acts as a middleman between a client device and a destination server. It can help improve privacy, security, and performance by hiding the client's IP address, filtering content, and caching frequently accessed data. - Proxies can be used for load balancing, distributing traffic across multiple servers to help prevent server overload and improve website or application performance. They can also be used for data analysis, as they can log requests and traffic, providing useful insights into user behavior and preferences.
What is a reverse proxy?
A reverse proxy is a type of proxy server that sits between a client and a server, but it is used to manage traffic going in the opposite direction of a traditional forward proxy. In a forward proxy, the client sends requests to the proxy server, which then forwards them to the destination server. However, in a reverse proxy, the client sends requests to the destination server, but the requests are intercepted by the reverse proxy before they reach the server. - They're commonly used to improve web server performance, provide high availability and fault tolerance, and enhance security by preventing direct access to the back-end server. They are often used in large-scale web applications and high-traffic websites to manage and distribute requests to multiple servers, resulting in improved scalability and reliability.
When you publish a project, you usually publish it with a license. What types of licenses are you familiar with and which one do you prefer to use?
Explain what is "X-Forwarded-For"
[Wikipedia](https://en.wikipedia.org/wiki/X-Forwarded-For): "The X-Forwarded-For (XFF) HTTP header field is a common method for identifying the originating IP address of a client connecting to a web server through an HTTP proxy or load balancer."
#### Load Balancers
What is a load balancer?
A load balancer accepts (or denies) incoming network traffic from a client, and based on some criteria (application related, network, etc.) it distributes those communications out to servers (at least one).
Why to use a load balancer?
* Scalability - using a load balancer, you can possibly add more servers in the backend to handle more requests/traffic from the clients, as opposed to using one server. * Redundancy - if one server in the backend dies, the load balancer will keep forwarding the traffic/requests to the second server so users won't even notice one of the servers in the backend is down.
What load balancer techniques/algorithms are you familiar with?
* Round Robin * Weighted Round Robin * Least Connection * Weighted Least Connection * Resource Based * Fixed Weighting * Weighted Response Time * Source IP Hash * URL Hash
What are the drawbacks of round robin algorithm in load balancing?
* A simple round robin algorithm knows nothing about the load and the spec of each server it forwards the requests to. It is possible, that multiple heavy workloads requests will get to the same server while other servers will got only lightweight requests which will result in one server doing most of the work, maybe even crashing at some point because it unable to handle all the heavy workloads requests by its own. * Each request from the client creates a whole new session. This might be a problem for certain scenarios where you would like to perform multiple operations where the server has to know about the result of operation so basically, being sort of aware of the history it has with the client. In round robin, first request might hit server X, while second request might hit server Y and ask to continue processing the data that was processed on server X already.
What is an Application Load Balancer?
In which scenarios would you use ALB?
At what layers a load balancer can operate?
L4 and L7
Can you perform load balancing without using a dedicated load balancer instance?
Yes, you can use DNS for performing load balancing.
What is DNS load balancing? What its advantages? When would you use it?
#### Load Balancers - Sticky Sessions
What are sticky sessions? What are their pros and cons?
Recommended read: * [Red Hat Article](https://access.redhat.com/solutions/900933) Cons: * Can cause uneven load on instance (since requests routed to the same instances) Pros: * Ensures in-proc sessions are not lost when a new request is created
Name one use case for using sticky sessions
You would like to make sure the user doesn't lose the current session data.
What sticky sessions use for enabling the "stickiness"?
Cookies. There are application based cookies and duration based cookies.
Explain application-based cookies
* Generated by the application and/or the load balancer * Usually allows to include custom data
Explain duration-based cookies
* Generated by the load balancer * Session is not sticky anymore once the duration elapsed
#### Load Balancers - Load Balancing Algorithms
Explain each of the following load balancing techniques * Round Robin * Weighted Round Robin * Least Connection * Weighted Least Connection * Resource Based * Fixed Weighting * Weighted Response Time * Source IP Hash * URL Hash
Explain use case for connection draining?
To ensure that a Classic Load Balancer stops sending requests to instances that are de-registering or unhealthy, while keeping the existing connections open, use connection draining. This enables the load balancer to complete in-flight requests made to instances that are de-registering or unhealthy. The maximum timeout value can be set between 1 and 3,600 seconds on both GCP and AWS.
#### Licenses
Are you familiar with "Creative Commons"? What do you know about it?
The Creative Commons license is a set of copyright licenses that allow creators to share their work with the public while retaining some control over how it can be used. The license was developed as a response to the restrictive standards of traditional copyright laws, which limited access of creative works. Its creators to choose the terms under which their works can be shared, distributed, and used by others. They're six main types of Creative Commons licenses, each with different levels of restrictions and permissions, the six licenses are: * Attribution (CC BY): Allows others to distribute, remix, and build upon the work, even commercially, as long as they credit the original creator. * Attribution-ShareAlike (CC BY-SA): Allows others to remix and build upon the work, even commercially, as long as they credit the original creator and release any new creations under the same license. * Attribution-NoDerivs (CC BY-ND): Allows others to distribute the work, even commercially, but they cannot remix or change it in any way and must credit the original creator. * Attribution-NonCommercial (CC BY-NC): Allows others to remix and build upon the work, but they cannot use it commercially and must credit the original creator. * Attribution-NonCommercial-ShareAlike (CC BY-NC-SA): Allows others to remix and build upon the work, but they cannot use it commercially, must credit the original creator, and must release any new creations under the same license. * Attribution-NonCommercial-NoDerivs (CC BY-NC-ND): Allows others to download and share the work, but they cannot use it commercially, remix or change it in any way, and must credit the original creator. Simply stated, the Creative Commons licenses are a way for creators to share their work with the public while retaining some control over how it can be used. The licenses promote creativity, innovation, and collaboration, while also respecting the rights of creators while still encouraging the responsible use of creative works. More information: https://creativecommons.org/licenses/
Explain the differences between copyleft and permissive licenses
In Copyleft, any derivative work must use the same licensing while in permissive licensing there are no such condition. GPL-3 is an example of copyleft license while BSD is an example of permissive license.
#### Random
How a search engine works?
How auto completion works?
What is faster than RAM?
CPU cache. [Source](https://www.enterprisestorageforum.com/hardware/cache-memory/)
What is a memory leak?
A memory leak is a programming error that occurs when a program fails to release memory that is no longer needed, causing the program to consume increasing amounts of memory over time. The leaks can lead to a variety of problems, including system crashes, performance degradation, and instability. Usually occurring after failed maintenance on older systems and compatibility with new components over time.
What is your favorite protocol?
SSH HTTP DHCP DNS ...
What is Cache API?
What is the C10K problem? Is it relevant today?
https://idiallo.com/blog/c10k-2016
## Storage
What types of storage are there?
* File * Block * Object
Explain Object Storage
- Data is divided to self-contained objects - Objects can contain metadata
What are the pros and cons of object storage?
Pros: - Usually with object storage, you pay for what you use as opposed to other storage types where you pay for the storage space you allocate - Scalable storage: Object storage mostly based on a model where what you use, is what you get and you can add storage as need Cons: - Usually performs slower than other types of storage - No granular modification: to change an object, you have re-create it
What are some use cases for using object storage?
Explain File Storage
- File Storage used for storing data in files, in a hierarchical structure - Some of the devices for file storage: hard drive, flash drive, cloud-based file storage - Files usually organized in directories
What are the pros and cons of File Storage?
Pros: - Users have full control of their own files and can run variety of operations on the files: delete, read, write and move. - Security mechanism allows for users to have a better control at things such as file locking
What are some examples of file storage?
Local filesystem Dropbox Google Drive
What types of storage devices are there?
Explain IOPS
Explain storage throughput
What is a filesystem?
A file system is a way for computers and other electronic devices to organize and store data files. It provides a structure that helps to organize data into files and directories, making it easier to find and manage information. A file system is crucial for providing a way to store and manage data in an organized manner. Commonly used filed systems: Windows: * NTFS * exFAT Mac OS: * HFS+ *APFS
Explain Dark Data
Explain MBR
## Questions you CAN ask A list of questions you as a candidate can ask the interviewer during or after the interview. These are only a suggestion, use them carefully. Not every interviewer will be able to answer these (or happy to) which should be perhaps a red flag warning for your regarding working in such place but that's really up to you.
What do you like about working here?
How does the company promote personal growth?
What is the current level of technical debt you are dealing with?
Be careful when asking this question - all companies, regardless of size, have some level of tech debt. Phrase the question in the light that all companies have the deal with this, but you want to see the current pain points they are dealing with
This is a great way to figure how managers deal with unplanned work, and how good they are at setting expectations with projects.
Why I should NOT join you? (or 'what you don't like about working here?')
What was your favorite project you've worked on?
This can give you insights in some of the cool projects a company is working on, and if you would enjoy working on projects like these. This is also a good way to see if the managers are allowing employees to learn and grow with projects outside of the normal work you'd do.
If you could change one thing about your day to day, what would it be?
Similar to the tech debt question, this helps you identify any pain points with the company. Additionally, it can be a great way to show how you'd be an asset to the team.
For Example, if they mention they have problem X, and you've solved that in the past, you can show how you'd be able to mitigate that problem.
Let's say that we agree and you hire me to this position, after X months, what do you expect that I have achieved?
Not only this will tell you what is expected from you, it will also provide big hint on the type of work you are going to do in the first months of your job.
## Testing
Explain white-box testing
Explain black-box testing
What are unit tests?
Unit test are a software testing technique that involves systimatically breaking down a system and testing each individual part of the assembly. These tests are automated and can be run repeatedly to allow developers to catch edge case scenarios or bugs quickly while developing. The main objective of unit tests are to verify each function is producing proper outputs given a set of inputs.
What types of tests would you run to test a web application?
Explain test harness?
What is A/B testing?
What is network simulation and how do you perform it?
What types of performances tests are you familiar with?
Explain the following types of tests: * Load Testing * Stress Testing * Capacity Testing * Volume Testing * Endurance Testing
## Regex Given a text file, perform the following exercises #### Extract
Extract all the numbers
- "\d+"
Extract the first word of each line
- "^\w+" Bonus: extract the last word of each line - "\w+(?=\W*$)" (in most cases, depends on line formatting)
Extract all the IP addresses
- "\b(?:\d{1,3}\ .){3}\d{1,3}\b" IPV4:(This format looks for 1 to 3 digit sequence 3 times)
Extract dates in the format of yyyy-mm-dd or yyyy-dd-mm
Extract email addresses
- "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\ .[A-Za-z]{2,}\b"
#### Replace
Replace tabs with four spaces
Replace 'red' with 'green'
## System Design
Explain what a "single point of failure" is.
A "single point of failure", in a system or organization, if it were to fail would cause the entire system to fail or significantly disrupt it's operation. In other words, it is a vulnerability where there is no backup in place to compensate for the failure.
What is CDN?
CDN (Content Delivery Network) responsible for distributing content geographically. Part of it, is what is known as edge locations, aka cache proxies, that allows users to get their content quickly due to cache features and geographical distribution.
Explain Multi-CDN
In single CDN, the whole content is originated from content delivery network.
In multi-CDN, content is distributed across multiple different CDNs, each might be on a completely different provider/cloud.
What are the benefits of Multi-CDN over a single CDN?
* Resiliency: Relying on one CDN means no redundancy. With multiple CDNs you don't need to worry about your CDN being down * Flexibility in Costs: Using one CDN enforces you to specific rates of that CDN. With multiple CDNs you can take into consideration using less expensive CDNs to deliver the content. * Performance: With Multi-CDN there is bigger potential in choosing better locations which more close to the client asking the content * Scale: With multiple CDNs, you can scale services to support more extreme conditions
Explain "3-Tier Architecture" (including pros and cons)
A "3-Tier Architecture" is a pattern used in software development for designing and structuring applications. It divides the application into 3 interconnected layers: Presentation, Business logic and Data storage. PROS: * Scalability * Security * Reusability CONS: * Complexity * Performance overhead * Cost and development time
Explain Mono-repo vs. Multi-repo.What are the cons and pros of each approach?
In a Mono-repo, all the code for an organization is stored in a single,centralized repository. PROS (Mono-repo): * Unified tooling * Code Sharing CONS (Mono-repo): * Increased complexity * Slower cloning In a Multi-repo setup, each component is stored in it's own separate repository. Each repository has it's own version control history. PROS (Multi-repo): * Simpler to manage * Different teams and developers can work on different parts of the project independently, making parallel development easier. CONS (Multi-repo): * Code duplication * Integration challenges
What are the drawbacks of monolithic architecture?
* Not suitable for frequent code changes and the ability to deploy new features * Not designed for today's infrastructure (like public clouds) * Scaling a team to work monolithic architecture is more challenging * If a single component in this architecture fails, then the entire application fails.
What are the advantages of microservices architecture over a monolithic architecture?
* Each of the services individually fail without escalating into an application-wide outage. * Each service can be developed and maintained by a separate team and this team can choose its own tools and coding language
What's a service mesh?
It is a layer that facilitates communication management and control between microservices in a containerized application. It handles tasks such as load balancing, encryption, and monitoring.
Explain "Loose Coupling"
In "Loose Coupling", components of a system communicate with each other with a little understanding of each other's internal workings. This improves scalability and ease of modification in complex systems.
What is a message queue? When is it used?
It is a communication mechanism used in distributed systems to enable asynchronous communication between different components. It is generally used when the systems use a microservices approach.
#### Scalability
Explain Scalability
The ability easily grow in size and capacity based on demand and usage.
Explain Elasticity
The ability to grow but also to reduce based on what is required
Explain Disaster Recovery
Disaster recovery is the process of restoring critical business systems and data after a disruptive event. The goal is to minimize the impact and resume normal business activities quickly. This involves creating a plan, testing it, backing up critical data, and storing it in safe locations. In case of a disaster, the plan is then executed, backups are restored, and systems are hopefully brought back online. The recovery process may take hours or days depending on the damages of infrastructure. This makes business planning important, as a well-designed and tested disaster recovery plan can minimize the impact of a disaster and keep operations going.
Explain Fault Tolerance and High Availability
Fault Tolerance - The ability to self-heal and return to normal capacity. Also the ability to withstand a failure and remain functional. High Availability - Being able to access a resource (in some use cases, using different platforms)
What is the difference between high availability and Disaster Recovery?
[wintellect.com](https://www.wintellect.com/high-availability-vs-disaster-recovery): "High availability, simply put, is eliminating single points of failure and disaster recovery is the process of getting a system back to an operational state when a system is rendered inoperative. In essence, disaster recovery picks up when high availability fails, so HA first."
Explain Vertical Scaling
Vertical Scaling is the process of adding resources to increase power of existing servers. For example, adding more CPUs, adding more RAM, etc.
What are the disadvantages of Vertical Scaling?
With vertical scaling alone, the component still remains a single point of failure. In addition, it has hardware limit where if you don't have more resources, you might not be able to scale vertically.
Which type of cloud services usually support vertical scaling?
Databases, cache. It's common mostly for non-distributed systems.
Explain Horizontal Scaling
Horizontal Scaling is the process of adding more resources that will be able handle requests as one unit
What is the disadvantage of Horizontal Scaling? What is often required in order to perform Horizontal Scaling?
A load balancer. You can add more resources, but if you would like them to be part of the process, you have to serve them the requests/responses. Also, data inconsistency is a concern with horizontal scaling.
Explain in which use cases will you use vertical scaling and in which use cases you will use horizontal scaling
Explain Resiliency and what ways are there to make a system more resilient
Explain "Consistent Hashing"
How would you update each of the services in the following drawing without having app (foo.com) downtime?

What is the problem with the following architecture and how would you fix it?

The load on the producers or consumers may be high which will then cause them to hang or crash.
Instead of working in "push mode", the consumers can pull tasks only when they are ready to handle them. It can be fixed by using a streaming platform like Kafka, Kinesis, etc. This platform will make sure to handle the high load/traffic and pass tasks/messages to consumers only when the ready to get them.
Users report that there is huge spike in process time when adding little bit more data to process as an input. What might be the problem?

How would you scale the architecture from the previous question to hundreds of users?
#### Cache
What is "cache"? In which cases would you use it?
What is "distributed cache"?
What is a "cache replacement policy"?
Take a look [here](https://en.wikipedia.org/wiki/Cache_replacement_policies)
Which cache replacement policies are you familiar with?
You can find a list [here](https://en.wikipedia.org/wiki/Cache_replacement_policies)
Explain the following cache policies: * FIFO * LIFO * LRU
Read about it [here](https://en.wikipedia.org/wiki/Cache_replacement_policies)
Why not writing everything to cache instead of a database/datastore?
Caching and databases serve different purposes and are optimized for different use cases. Caching is used to speed up read operations by storing frequently accessed data in memory or on a fast storage medium. By keeping data close to the application, caching reduces the latency and overhead of accessing data from a slower, more distant storage system such as a database or disk. On the other hand, databases are optimized for storing and managing persistent data. Databases are designed to handle concurrent read and write operations, enforce consistency and integrity constraints, and provide features such as indexing and querying.
#### Migrations
How you prepare for a migration? (or plan a migration)
You can mention: roll-back & roll-forward cut over dress rehearsals DNS redirection
Explain "Branch by Abstraction" technique
#### Design a system
Can you design a video streaming website?
Can you design a photo upload website?
How would you build a URL shortener?
#### More System Design Questions Additional exercises can be found in [system-design-notebook repository](https://github.com/bregman-arie/system-design-notebook).

## Hardware
What is a CPU?
A central processing unit (CPU) performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).
What is RAM?
RAM (Random Access Memory) is the hardware in a computing device where the operating system (OS), application programs and data in current use are kept so they can be quickly reached by the device's processor. RAM is the main memory in a computer. It is much faster to read from and write to than other kinds of storage, such as a hard disk drive (HDD), solid-state drive (SSD) or optical drive.
What is a GPU?
A GPU, or Graphics Processing Unit, is a specialized electronic circuit designed to expedite image and video processing for display on a computer screen.
What is an embedded system?
An embedded system is a computer system - a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts.
Can you give an example of an embedded system?
A common example of an embedded system is a microwave oven's digital control panel, which is managed by a microcontroller. When committed to a certain goal, Raspberry Pi can serve as an embedded system.
What types of storage are there?
There are several types of storage, including hard disk drives (HDDs), solid-state drives (SSDs), and optical drives (CD/DVD/Blu-ray). Other types of storage include USB flash drives, memory cards, and network-attached storage (NAS).
What are some considerations DevOps teams should keep in mind when selecting hardware for their job?
Choosing the right DevOps hardware is essential for ensuring streamlined CI/CD pipelines, timely feedback loops, and consistent service availability. Here's a distilled guide on what DevOps teams should consider: 1. **Understanding Workloads**: - **CPU**: Consider the need for multi-core or high-frequency CPUs based on your tasks. - **RAM**: Enough memory is vital for activities like large-scale coding or intensive automation. - **Storage**: Evaluate storage speed and capacity. SSDs might be preferable for swift operations. 2. **Expandability**: - **Horizontal Growth**: Check if you can boost capacity by adding more devices. - **Vertical Growth**: Determine if upgrades (like RAM, CPU) to individual machines are feasible. 3. **Connectivity Considerations**: - **Data Transfer**: Ensure high-speed network connections for activities like code retrieval and data transfers. - **Speed**: Aim for low-latency networks, particularly important for distributed tasks. - **Backup Routes**: Think about having backup network routes to avoid downtimes. 4. **Consistent Uptime**: - Plan for hardware backups like RAID configurations, backup power sources, or alternate network connections to ensure continuous service. 5. **System Compatibility**: - Make sure your hardware aligns with your software, operating system, and intended platforms. 6. **Power Efficiency**: - Hardware that uses energy efficiently can reduce costs in long-term, especially in large setups. 7. **Safety Measures**: - Explore hardware-level security features, such as TPM, to enhance protection. 8. **Overseeing & Control**: - Tools like ILOM can be beneficial for remote handling. - Make sure the hardware can be seamlessly monitored for health and performance. 9. **Budgeting**: - Consider both initial expenses and long-term costs when budgeting. 10. **Support & Community**: - Choose hardware from reputable vendors known for reliable support. - Check for available drivers, updates, and community discussions around the hardware. 11. **Planning Ahead**: - Opt for hardware that can cater to both present and upcoming requirements. 12. **Operational Environment**: - **Temperature Control**: Ensure cooling systems to manage heat from high-performance units. - **Space Management**: Assess hardware size considering available rack space. - **Reliable Power**: Factor in consistent and backup power sources. 13. **Cloud Coordination**: - If you're leaning towards a hybrid cloud setup, focus on how local hardware will mesh with cloud resources. 14. **Life Span of Hardware**: - Be aware of the hardware's expected duration and when you might need replacements or upgrades. 15. **Optimized for Virtualization**: - If utilizing virtual machines or containers, ensure the hardware is compatible and optimized for such workloads. 16. **Adaptability**: - Modular hardware allows individual component replacements, offering more flexibility. 17. **Avoiding Single Vendor Dependency**: - Try to prevent reliance on a single vendor unless there are clear advantages. 18. **Eco-Friendly Choices**: - Prioritize sustainably produced hardware that's energy-efficient and environmentally responsible. In essence, DevOps teams should choose hardware that is compatible with their tasks, versatile, gives good performance, and stays within their budget. Furthermore, long-term considerations such as maintenance, potential upgrades, and compatibility with impending technological shifts must be prioritized.
What is the role of hardware in disaster recovery planning and implementation?
Hardware is critical in disaster recovery (DR) solutions. While the broader scope of DR includes things like standard procedures, norms, and human roles, it's the hardware that keeps business processes running smoothly. Here's an outline of how hardware works with DR: 1. **Storing Data and Ensuring Its Duplication**: - **Backup Equipment**: Devices like tape storage, backup servers, and external HDDs keep essential data stored safely at a different location. - **Disk Arrays**: Systems such as RAID offer a safety net. If one disk crashes, the others compensate. 2. **Alternate Systems for Recovery**: - **Backup Servers**: These step in when the main servers falter, maintaining service flow. - **Traffic Distributors**: Devices like load balancers share traffic across servers. If a server crashes, they reroute users to operational ones. 3. **Alternate Operation Hubs**: - **Ready-to-use Centers**: Locations equipped and primed to take charge immediately when the main center fails. - **Basic Facilities**: Locations with necessary equipment but lacking recent data, taking longer to activate. - **Semi-prepped Facilities**: Locations somewhat prepared with select systems and data, taking a moderate duration to activate. 4. **Power Backup Mechanisms**: - **Instant Power Backup**: Devices like UPS offer power during brief outages, ensuring no abrupt shutdowns. - **Long-term Power Solutions**: Generators keep vital systems operational during extended power losses. 5. **Networking Equipment**: - **Backup Internet Connections**: Having alternatives ensures connectivity even if one provider faces issues. - **Secure Connection Tools**: Devices ensuring safe remote access, especially crucial during DR situations. 6. **On-site Physical Setup**: - **Organized Housing**: Structures like racks to neatly store and manage hardware. - **Emergency Temperature Control**: Backup cooling mechanisms to counter server overheating in HVAC malfunctions. 7. **Alternate Communication Channels**: - **Orbit-based Phones**: Handy when regular communication methods falter. - **Direct Communication Devices**: Devices like radios useful when primary systems are down. 8. **Protection Mechanisms**: - **Electronic Barriers & Alert Systems**: Devices like firewalls and intrusion detection keep DR systems safeguarded. - **Physical Entry Control**: Systems controlling entry and monitoring, ensuring only cleared personnel have access. 9. **Uniformity and Compatibility in Hardware**: - It's simpler to manage and replace equipment in emergencies if hardware configurations are consistent and compatible. 10. **Equipment for Trials and Upkeep**: - DR drills might use specific equipment to ensure the primary systems remain unaffected. This verifies the equipment's readiness and capacity to manage real crises. In summary, while software and human interventions are important in disaster recovery operations, it is the hardware that provides the underlying support. It is critical for efficient disaster recovery plans to keep this hardware resilient, duplicated, and routinely assessed.
What is a RAID?
RAID is an acronym that stands for "Redundant Array of Independent Disks." It is a technique that combines numerous hard drives into a single device known as an array in order to improve performance, expand storage capacity, and/or offer redundancy to prevent data loss. RAID levels (for example, RAID 0, RAID 1, and RAID 5) provide varied benefits in terms of performance, redundancy, and storage efficiency.
What is a microcontroller?
A microcontroller is a small integrated circuit that controls certain tasks in an embedded system. It typically includes a CPU, memory, and input/output peripherals.
What is a Network Interface Controller or NIC?
A Network Interface Controller (NIC) is a piece of hardware that connects a computer to a network and allows it to communicate with other devices.
What is a DMA?
Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).DMA enables devices to share and receive data from the main memory in a computer. It does this while still allowing the CPU to perform other tasks.
What is a Real-Time Operating Systems?
A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which manages the sharing of system resources with a scheduler, data buffers, or fixed task prioritization in a multitasking or multiprogramming environment. Processing time requirements need to be fully understood and bound rather than just kept as a minimum. All processing must occur within the defined constraints. Real-time operating systems are event-driven and preemptive, meaning the OS can monitor the relevant priority of competing tasks, and make changes to the task priority. Event-driven systems switch between tasks based on their priorities, while time-sharing systems switch the task based on clock interrupts.
List of interrupt types
There are six classes of interrupts possible: * External * Machine check * I/O * Program * Restart * Supervisor call (SVC)
## Big Data
Explain what is exactly Big Data
As defined by Doug Laney: * Volume: Extremely large volumes of data * Velocity: Real time, batch, streams of data * Variety: Various forms of data, structured, semi-structured and unstructured * Veracity or Variability: Inconsistent, sometimes inaccurate, varying data
What is DataOps? How is it related to DevOps?
DataOps seeks to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value. DataOps combines Agile development, DevOps and statistical process controls and applies them to data analytics.
What is Data Architecture?
An answer from [talend.com](https://www.talend.com/resources/what-is-data-architecture): "Data architecture is the process of standardizing how organizations collect, store, transform, distribute, and use data. The goal is to deliver relevant data to people who need it, when they need it, and help them make sense of it."
Explain the different formats of data
* Structured - data that has defined format and length (e.g. numbers, words) * Semi-structured - Doesn't conform to a specific format but is self-describing (e.g. XML, SWIFT) * Unstructured - does not follow a specific format (e.g. images, test messages)
What is a Data Warehouse?
[Wikipedia's explanation on Data Warehouse](https://en.wikipedia.org/wiki/Data_warehouse) [Amazon's explanation on Data Warehouse](https://aws.amazon.com/data-warehouse)
What is Data Lake?
[Data Lake - Wikipedia](https://en.wikipedia.org/wiki/Data_lake)
Can you explain the difference between a data lake and a data warehouse?
What is "Data Versioning"? What models of "Data Versioning" are there?
What is ETL?
#### Apache Hadoop
Explain what is Hadoop
[Apache Hadoop - Wikipedia](https://en.wikipedia.org/wiki/Apache_Hadoop)
Explain Hadoop YARN
Responsible for managing the compute resources in clusters and scheduling users' applications
Explain Hadoop MapReduce
A programming model for large-scale data processing
Explain Hadoop Distributed File Systems (HDFS)
* Distributed file system providing high aggregate bandwidth across the cluster. * For a user it looks like a regular file system structure but behind the scenes it's distributed across multiple machines in a cluster * Typical file size is TB and it can scale and supports millions of files * It's fault tolerant which means it provides automatic recovery from faults * It's best suited for running long batch operations rather than live analysis
What do you know about HDFS architecture?
[HDFS Architecture](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) * Master-slave architecture * Namenode - master, Datanodes - slaves * Files split into blocks * Blocks stored on datanodes * Namenode controls all metadata
## Ceph
Explain what is Ceph
Ceph is an Open-Source Distributed Storage System designed to provide excellent performance, reliability, and scalability. It's often used in cloud computing environments and Data Centers.
True or False? Ceph favor consistency and correctness over performances
True
Which services or types of storage Ceph supports?
* Object (RGW) * Block (RBD) * File (CephFS)
What is RADOS?
* Reliable Autonomic Distributed Object Storage * Provides low-level data object storage service * Strong Consistency * Simplifies design and implementation of higher layers (block, file, object)
Describe RADOS software components
* Monitor * Central authority for authentication, data placement, policy * Coordination point for all other cluster components * Protect critical cluster state with Paxos * Manager * Aggregates real-time metrics (throughput, disk usage, etc.) * Host for pluggable management functions * 1 active, 1+ standby per cluster * OSD (Object Storage Daemon) * Stores data on an HDD or SSD * Services client IO requests
What is the workflow of retrieving data from Ceph?
The work flow is as follows: 1. The client sends a request to the ceph cluster to retrieve data: > **Client could be any of the following** >> * Ceph Block Device >> * Ceph Object Gateway >> * Any third party ceph client 2. The client retrieves the latest cluster map from the Ceph Monitor 3. The client uses the CRUSH algorithm to map the object to a placement group. The placement group is then assigned to a OSD. 4. Once the placement group and the OSD Daemon are determined, the client can retrieve the data from the appropriate OSD
What is the workflow of writing data to Ceph?
The work flow is as follows: 1. The client sends a request to the ceph cluster to retrieve data 2. The client retrieves the latest cluster map from the Ceph Monitor 3. The client uses the CRUSH algorithm to map the object to a placement group. The placement group is then assigned to a Ceph OSD Daemon dynamically. 4. The client sends the data to the primary OSD of the determined placement group. If the data is stored in an erasure-coded pool, the primary OSD is responsible for encoding the object into data chunks and coding chunks, and distributing them to the other OSDs.
What are "Placement Groups"?
Describe in the detail the following: Objects -> Pool -> Placement Groups -> OSDs
What is OMAP?
What is a metadata server? How it works?
## Packer
What is Packer? What is it used for?
In general, Packer automates machine images creation. It allows you to focus on configuration prior to deployment while making the images. This allows you start the instances much faster in most cases.
Packer follows a "configuration->deployment" model or "deployment->configuration"?
A configuration->deployment which has some advantages like: 1. Deployment Speed - you configure once prior to deployment instead of configuring every time you deploy. This allows you to start instances/services much quicker. 2. More immutable infrastructure - with configuration->deployment it's not likely to have very different deployments since most of the configuration is done prior to the deployment. Issues like dependencies errors are handled/discovered prior to deployment in this model.
## Release
Explain Semantic Versioning
[This](https://semver.org/) page explains it perfectly: ``` Given a version number MAJOR.MINOR.PATCH, increment the: MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backwards compatible manner PATCH version when you make backwards compatible bug fixes Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format. ```
## Certificates If you are looking for a way to prepare for a certain exam this is the section for you. Here you'll find a list of certificates, each references to a separate file with focused questions that will help you to prepare to the exam. Good luck :) #### AWS * [Cloud Practitioner](certificates/aws-cloud-practitioner.md) (Latest update: 2020) * [Solutions Architect Associate](certificates/aws-solutions-architect-associate.md) (Latest update: 2021) * [Cloud SysOps Administration Associate](certificates/aws-cloud-sysops-associate.md) (Latest update: Oct 2022) #### Azure * [AZ-900](certificates/azure-fundamentals-az-900.md) (Latest update: 2021) #### Kubernetes * [Certified Kubernetes Administrator (CKA)](topics/kubernetes/CKA.md) (Latest update: 2022) ## Additional DevOps and SRE Projects

## Credits Thanks to all of our amazing [contributors](https://github.com/bregman-arie/devops-exercises/graphs/contributors) who make it easy for everyone to learn new things :) Logos credits can be found [here](credits.md) ## License [![License: CC BY-NC-ND 3.0](https://img.shields.io/badge/License-CC%20BY--NC--ND%203.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-nd/3.0/) ================================================ FILE: certificates/aws-certification-paths.md ================================================ ## AWS Certification Paths [AWS Certification Paths based on Cloud Roles and Responsibilities](https://d1.awsstatic.com/training-and-certification/docs/AWS_certification_paths.pdf) ================================================ FILE: certificates/aws-cloud-practitioner.md ================================================ ## AWS - Cloud Practitioner A summary of what you need to know for the exam can be found [here](https://aws.amazon.com/certification/certified-cloud-practitioner/) #### Cloud 101
What is cloud computing?
[Wikipedia](https://en.wikipedia.org/wiki/Cloud_computing): "Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user" Cloud computing also allows you to scale resources up or down as needed, paying only for what you use.
What types of Cloud Computing services are there?
IAAS PAAS SAAS
Explain each of the following and give an example: * IAAS * PAAS * SAAS
- IAAS - Infrastructure As A Service is a cloud computing service where a cloud provider rents out IT infrastructure such as compute, networking resources and storage over the internet (e.g., AWS EC2).
- PAAS - Platform As A Service is a cloud hosting platform with an on-demand access to ready-to-use set of deployment, application management and DevOps tools (e.g., AWS Elastic Beanstalk).
- SAAS - Software As A Service is a software distribution model in which services are hosted by a cloud service provider (e.g., AWS WorkSpaces or any web-based email service).
What types of clouds (or cloud deployments) are there?
* Public * Hybrid * Private
Explain each of the following Cloud Computing Deployments: * Public * Hybrid * Private
- Public - Public cloud is when you leverage cloud services over the open internet on hardware owned by the cloud provider, but its usage is shared by other companies. It offers cost-effectiveness and ease of scaling.
- Hybrid - A hybrid cloud is a cloud computing environment that uses a mix of combining a public and private cloud environment, like an on-premises data center, and public CSPs. It provides greater flexibility and more deployment options.
- Private - Private cloud means that the cloud infrastructure is provisioned for exclusive use by a single organization. Resources are not shared with others, so it offers more control over security and data. [Read more](https://aws.amazon.com/types-of-cloud-computing/)
#### AWS Global Infrastructure
Explain the following * Availability zone * Region * Edge location
AWS regions are data centers hosted across different geographical locations worldwide, each region is completely independent of one another.
Within each region, there are multiple isolated locations known as Availability Zones. Multiple availability zones ensure high availability in case one of them goes down. Each Availability Zone is physically separated from others, with its own power, networking, and connectivity.
Edge locations are basically content delivery network endpoints which cache data and ensure lower latency and faster delivery to the users in any location. They are located in major cities around the world.
#### AWS Networking
What is VPC?
"A logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define". Read more about it [here](https://aws.amazon.com/vpc). A VPC spans all the Availability Zones within a single region.
True or False? VPC spans multiple regions
False. A VPC is region-specific and cannot span multiple regions.
True or False? Subnets belong to the same VPC, can be in different availability zones
True. Just to clarify, a subnet must reside entirely in one AZ, but a single VPC can contain subnets across multiple AZs.
What is an Internet Gateway?
"component that allows communication between instances in your VPC and the internet" (AWS docs). Read more about it [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html) It scales horizontally and is highly available, allowing inbound and outbound traffic to flow without imposing availability risks or bandwidth constraints.
True or False? NACL allow or deny traffic on the subnet level
True
True or False? Multiple Internet Gateways can be attached to one VPC
False. Only one internet gateway can be attached to a single VPC.
True or False? Route Tables used to allow or deny traffic from the internet to AWS instances
False. Route tables are used to direct traffic to the right destination (e.g., Internet Gateway, NAT Gateway, etc.), not to allow or deny traffic.
Explain Security Groups and Network ACLs
* NACL - security layer on the subnet level. They are stateless, meaning inbound and outbound rules are evaluated separately.
* Security Group - security layer on the instance level. They are stateful, meaning if you allow inbound traffic, outbound traffic is automatically allowed, and vice versa. Read more about it [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html) and [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html)
What is AWS Direct Connect?
Allows you to connect your corporate network to AWS network. It provides a dedicated network connection that can offer more consistent performance than internet-based connections.
#### AWS Compute
What is EC2?
"a web service that provides secure, resizable compute capacity in the cloud". Read more [here](https://aws.amazon.com/ec2) EC2 allows you to quickly scale up or down to match resource needs, paying only for the compute time you consume.
What is AMI?
Amazon Machine Images is "An Amazon Machine Image (AMI) provides the information required to launch an instance". Read more [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) An AMI typically includes an operating system, application server, and applications, so you can quickly spin up new instances with the same configuration.
What are the different source for AMIs?
* Personal AMIs - AMIs you create * AWS Marketplace for AMIs - Paid AMIs usually bundled with licensed software * Community AMIs - Free You can also share AMIs across accounts if needed.
What is instance type?
"the instance type that you specify determines the hardware of the host computer used for your instance" Read more about instance types [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) Instance types vary by CPU, memory, storage, and networking capacity, e.g., t2.micro, c5.large, etc.
True or False? The following are instance types available for a user in AWS: * Compute optimized * Network optimized * Web optimized
False. From the above list only compute optimized is available. There's no "Web optimized" or "Network optimized" instance type. You do have memory optimized, storage optimized, etc.
What is EBS?
"provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices." More on EBS [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html) EBS volumes are tied to an Availability Zone. They can be snapshotted to Amazon S3 for durability and can be detached/reattached between EC2 instances in the same AZ.
What EC2 pricing models are there?
On Demand - pay a fixed rate by the hour/second with no commitment. You can provision and terminate at any time.
Reserved - you get capacity reservation, basically purchase an instance for a fixed time period (1 or 3 years). The longer, the cheaper.
Spot - Enables you to bid whatever price you want for instances or pay the spot price. Ideal for workloads that can be interrupted.
Dedicated Hosts - physical EC2 server dedicated for your use. Helps you address compliance requirements and use your own software licenses.
What are Security Groups?
"A security group acts as a virtual firewall that controls the traffic for one or more instances" More on this subject [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html) They are stateful, so any rule applied for inbound automatically applies to outbound, and vice versa (if the inbound rule is allowed).
What can you attach to an EC2 instance in order to store data?
EBS Additionally, some instance types support Instance Store (ephemeral storage), and you can also mount EFS (file storage) if you need a shared filesystem across multiple instances.
What EC2 RI types are there?
Standard RI - most significant discount + suited for steady-state usage
Convertible RI - discount + change attribute of RI + suited for steady-state usage
Scheduled RI - launch within time windows you reserve Learn more about EC2 RI [here](https://aws.amazon.com/ec2/pricing/reserved-instances) Some RIs also offer different payment options (no upfront, partial upfront, or all upfront) affecting the discount level.
#### AWS Containers
What is Amazon ECS?
Amazon definition: "Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service. Customers such as Duolingo, Samsung, GE, and Cook Pad use ECS to run their most sensitive and mission critical applications because of its security, reliability, and scalability." Learn more [here](https://aws.amazon.com/ecs)
What is Amazon ECR?
Amazon definition: "Amazon Elastic Container Registry (ECR) is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images." Learn more [here](https://aws.amazon.com/ecr)
What is AWS Fargate?
Amazon definition: "AWS Fargate is a serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS)." Learn more [here](https://aws.amazon.com/fargate)
#### AWS Storage
Explain what is AWS S3?
S3 stands for 3 S, Simple Storage Service. S3 is a object storage service which is fast, scalable and durable. S3 enables customers to upload, download or store any file or object that is up to 5 TB in size. More on S3 [here](https://aws.amazon.com/s3)
What is a bucket?
An S3 bucket is a resource which is similar to folders in a file system and allows storing objects, which consist of data.
True or False? A bucket name must be globally unique
True
Explain folders and objects in regards to buckets
* Folder - any sub folder in an s3 bucket * Object - The files which are stored in a bucket
Explain the following: * Object Lifecycles * Object Sharing * Object Versioning
* Object Lifecycles - Transfer objects between storage classes based on defined rules of time periods * Object Sharing - Share objects via a URL link * Object Versioning - Manage multiple versions of an object
Explain Object Durability and Object Availability
Object Durability: The percent over a one-year time period that a file will not be lost Object Availability: The percent over a one-year time period that a file will be accessible
What is a storage class? What storage classes are there?
Each object has a storage class assigned to, affecting its availability and durability. This also has effect on costs. Storage classes offered today: * Standard: * Used for general, all-purpose storage (mostly storage that needs to be accessed frequently) * The most expensive storage class * 11x9% durability * 2x9% availability * Default storage class * Standard-IA (Infrequent Access) * Long lived, infrequently accessed data but must be available the moment it's being accessed * 11x9% durability * 99.90% availability * One Zone-IA (Infrequent Access): * Long-lived, infrequently accessed, non-critical data * Less expensive than Standard and Standard-IA storage classes * 2x9% durability * 99.50% availability * Intelligent-Tiering: * Long-lived data with changing or unknown access patterns. Basically, In this class the data automatically moves to the class most suitable for you based on usage patterns * Price depends on the used class * 11x9% durability * 99.90% availability * Glacier: Archive data with retrieval time ranging from minutes to hours * Glacier Deep Archive: Archive data that rarely, if ever, needs to be accessed with retrieval times in hours * Both Glacier and Glacier Deep Archive are: * The most cheap storage classes * have 9x9% durability More on storage classes [here](https://aws.amazon.com/s3/storage-classes)
A customer would like to move data which is rarely accessed from standard storage class to the most cheapest class there is. Which storage class should be used? * One Zone-IA * Glacier Deep Archive * Intelligent-Tiering
Glacier Deep Archive
What Glacier retrieval options are available for the user?
Expedited, Standard and Bulk
True or False? Each AWS account can store up to 500 PetaByte of data. Any additional storage will cost double
False. Unlimited capacity.
Explain what is Storage Gateway
"AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage". More on Storage Gateway [here](https://aws.amazon.com/storagegateway)
Explain the following Storage Gateway deployments types * File Gateway * Volume Gateway * Tape Gateway
Explained in detail [here](https://aws.amazon.com/storagegateway/faqs)
What is the difference between stored volumes and cached volumes?
Stored Volumes - Data is located at customer's data center and periodically backed up to AWS Cached Volumes - Data is stored in AWS cloud and cached at customer's data center for quick access
What is "Amazon S3 Transfer Acceleration"?
AWS definition: "Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket" Learn more [here](https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html)
What is Amazon EFS?
Amazon definition: "Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources." Learn more [here](https://aws.amazon.com/efs)
What is AWS Snowmobile?
"AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS." Learn more [here](https://aws.amazon.com/snowmobile)
#### AWS IAM
What is IAM? What are some of its features?
IAM stands for Identity and Access Management, and is used for managing users, groups, access policies & roles Full explanation is [here](https://aws.amazon.com/iam)
True or False? IAM configuration is defined globally and not per region
True
True or False? When creating an AWS account, root account is created by default. This is the recommended account to use and share in your organization
False. Instead of using the root account, you should be creating users and use them.
True or False? Groups in AWS IAM, can contain only users and not other groups
True
True or False? Users in AWS IAM, can belong only to a single group
False. Users can belong to multiple groups.
What are Roles?
A way for allowing a service of AWS to use another service of AWS. You assign roles to AWS resources. For example, you can make use of a role which allows EC2 service to accesses s3 buckets (read and write).
What are Policies?
Policies documents used to give permissions as to what a user, group or role are able to do. Their format is JSON.
A user is unable to access an s3 bucket. What might be the problem?
There can be several reasons for that. One of them is lack of policy. To solve that, the admin has to attach the user with a policy what allows him to access the s3 bucket.
What should you use to: * Grant access between two services/resources? * Grant user access to resources/services?
* Role * Policy
What permissions does a new user have?
Only a login access.
##### AWS ELB
What is ELB (Elastic Load Balancing)?
AWS definition: "Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions." More on ELB [here](https://aws.amazon.com/elasticloadbalancing)
What is auto scaling?
AWS definition: "AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost" Read more about auto scaling [here](https://aws.amazon.com/autoscaling)
True or False? Auto Scaling is about adding resources (such as instances) and not about removing resource
False. Auto scaling adjusts capacity and this can mean removing some resources based on usage and performances.
What types of load balancers are supported in EC2 and what are they used for?
* Application LB - layer 7 traffic * Network LB - ultra-high performances or static IP address * Classic LB - low costs, good for test or dev environments
#### AWS DNS
What is Route 53?
"Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service" Some of Route 53 features: * Register domain * DNS service - domain name translations * Health checks - verify your app is available More on Route 53 [here](https://aws.amazon.com/route53)
#### AWS CloudFront
Explain what is CloudFront
AWS definition: "Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment." More on CloudFront [here](https://aws.amazon.com/cloudfront)
Explain the following * Origin * Edge location * Distribution
#### AWS Monitoring & Logging
What is AWS CloudWatch?
AWS definition: "Amazon CloudWatch is a monitoring and observability service..." More on CloudWatch [here](https://aws.amazon.com/cloudwatch)
What is AWS CloudTrail?
AWS definition: "AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account." Read more on CloudTrail [here](https://aws.amazon.com/cloudtrail)
What is Simply Notification Service?
AWS definition: "a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications." Read more about it [here](https://aws.amazon.com/sns)
Explain the following in regards to SNS: * Topics * Subscribers * Publishers
* Topics - used for grouping multiple endpoints * Subscribers - the endpoints where topics send messages to * Publishers - the provider of the message (event, person, ...)
#### AWS Security
What is the shared responsibility model? What AWS is responsible for and what the user is responsible for based on the shared responsibility model?
The shared responsibility model defines what the customer is responsible for and what AWS is responsible for. For example, AWS is responsible for security "of" the cloud, while the customer is responsible for security "in" the cloud. More on the shared responsibility model [here](https://aws.amazon.com/compliance/shared-responsibility-model)
True or False? Based on the shared responsibility model, Amazon is responsible for physical CPUs and security groups on instances
False. It is responsible for Hardware in its sites but not for security groups which created and managed by the users.
Explain "Shared Controls" in regards to the shared responsibility model
AWS definition: "apply to both the infrastructure layer and customer layers, but in completely separate contexts or perspectives. In a shared control, AWS provides the requirements for the infrastructure and the customer must provide their own control implementation within their use of AWS services" Learn more about it [here](https://aws.amazon.com/compliance/shared-responsibility-model)
What is the AWS compliance program?
What is AWS Artifact?
AWS definition: "AWS Artifact is your go-to, central resource for compliance-related information that matters to you. It provides on-demand access to AWS’ security and compliance reports and select online agreements." Read more about it [here](https://aws.amazon.com/artifact)
What is AWS Inspector?
AWS definition: "Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices."" Learn more [here](https://aws.amazon.com/inspector)
What is AWS Guarduty?
Guarduty is a threat detection service that monitors your AWS accounts to help detect and mitigate malicious activity
What is AWS Shield?
AWS definition: "AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS."
What is AWS WAF? Give an example of how it can used and describe what resources or services you can use it with
An AWS Web Application Firewall (WAF) can filter out unwanted web traffic (bots), and protect against attacks like SQL injection and cross-site scripting. One service you could use it with would be Amazon CloudFront, a CDN service, to block attacks before they reach your origin servers
What AWS VPN is used for?
What is the difference between Site-to-Site VPN and Client VPN?
What is AWS CloudHSM?
Amazon definition: "AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud." Learn more [here](https://aws.amazon.com/cloudhsm)
True or False? AWS Inspector can perform both network and host assessments
True
What is AWS Acceptable Use Policy?
It describes prohibited uses of the web services offered by AWS. More on AWS Acceptable Use Policy [here](https://aws.amazon.com/aup)
What is AWS Key Management Service (KMS)?
AWS definition: "KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications." More on KMS [here](https://aws.amazon.com/kms)
True or False? A user is not allowed to perform penetration testing on any of the AWS services
False. On some services, like EC2, CloudFront and RDS, penetration testing is allowed.
True or False? DDoS attack is an example of allowed penetration testing activity
False.
True or False? AWS Access Key is a type of MFA device used for AWS resources protection
False. Security key is an example of an MFA device.
What is Amazon Cognito?
Amazon definition: "Amazon Cognito handles user authentication and authorization for your web and mobile apps." Learn more [here](https://docs.aws.amazon.com/cognito/index.html)
What is AWS ACM?
Amazon definition: "AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources." Learn more [here](https://aws.amazon.com/certificate-manager)
#### AWS Databases
What is AWS RDS?
Amazon Relational Database Service (RDS) is a service for setting up and managing resizable, cost-efficient relational databases resource Learn more [here](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html)
What is AWS DynamoDB?
Explain "Point-in-Time Recovery" feature in DynamoDB
Amazon definition: "You can create on-demand backups of your Amazon DynamoDB tables, or you can enable continuous backups using point-in-time recovery. For more information about on-demand backups, see On-Demand Backup and Restore for DynamoDB." Learn more [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html)
Explain "Global Tables" in DynamoDB
Amazon definition: "A global table is a collection of one or more replica tables, all owned by a single AWS account." Learn more [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_HowItWorks.html)
What is DynamoDB Accelerator?
Amazon definition: "Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds..." Learn more [here](https://aws.amazon.com/dynamodb/dax)
What is AWS Redshift and how is it different than RDS?
AWS Redshift is a cloud data warehousing service that is geared towards handling massive amounts of data (think petabytes) and being able to execute complex queries. In contrast, Amazon RDS is best suited for things like web applications requiring simple queries with more frequent transactions, and on a smaller scale.
What is AWS ElastiCache? For what cases is it used?
Amazon Elasticache is a fully managed Redis or Memcached in-memory data store. It's great for use cases like two-tier web applications where the most frequently accesses data is stored in ElastiCache so response time is optimal.
What is Amazon Aurora
A MySQL & Postgresql based relational database. Also, the default database proposed for the user when using RDS for creating a database. Great for use cases like two-tier web applications that has a MySQL or Postgresql database layer and you need automated backups for your application.
What is Amazon DocumentDB?
Amazon definition: "Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data." Learn more [here](https://aws.amazon.com/documentdb)
What "AWS Database Migration Service" is used for?
What type of storage is used by Amazon RDS?
EBS
Explain Amazon RDS Read Replicas
AWS definition: "Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances. They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads." Read more about [here](https://aws.amazon.com/rds/features/read-replicas)
#### AWS Serverless Compute
Explain what is AWS Lambda
AWS definition: "AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume." Read more on it [here](https://aws.amazon.com/lambda)
True or False? In AWS Lambda, you are charged as long as a function exists, regardless of whether it's running or not
False. Charges are being made when the code is executed.
Which of the following set of languages Lambda supports? * R, Swift, Rust, Kotlin * Python, Ruby, Go * Python, Ruby, PHP
* Python, Ruby, Go
#### Identify the service or tool
What would you use for automating code/software deployments?
AWS CodeDeploy
What would you use for easily creating similar AWS environments/resources for different customers?
CloudFormation
Which service would you use for building a website or web application?
Lightsail or Elastic Beanstalk
Which tool would you use for choosing between Reserved instances or On-Demand instances?
Cost Explorer
What would you use to check how many unassociated Elastic IP address you have?
Trusted Advisor
What service allows you to transfer large amounts (Petabytes) of data in and out of the AWS cloud?
AWS Snowball
What provides a virtual network dedicated to your AWS account?
VPC
What you would use for having automated backups for an application that has MySQL database layer?
Amazon Aurora
What would you use to migrate on-premise database to AWS?
AWS Database Migration Service (DMS)
What would you use to check why certain EC2 instances were terminated?
AWS CloudTrail
What would you use for SQL database?
AWS RDS
What would you use for NoSQL database?
AWS DynamoDB
What would you use for running SQL queries interactively on S3?
AWS Athena
What would you use for adding image and video analysis to your application?
AWS Rekognition
Which service would you use for debugging and improving performances issues with your applications?
AWS X-Ray
Which service is used for sending notifications?
SNS
Which service would you use for monitoring malicious activity and unauthorized behavior in regards to AWS accounts and workloads?
Amazon GuardDuty
Which service would you use for centrally manage billing, control access, compliance, and security across multiple AWS accounts?
AWS Organizations
Which service would you use for web application protection?
AWS WAF
You would like to monitor some of your resources in the different services. Which service would you use for that?
CloudWatch
Which service would you use for performing security assessment?
AWS Inspector
Which service would you use for creating DNS record?
Route 53
What would you use if you need a fully managed document database?
Amazon DocumentDB
Which service would you use to add access control (or sign-up, sign-in forms) to your web/mobile apps?
AWS Cognito
Which service would you use if you need messaging queue?
Simple Queue Service (SQS)
Which service would you use if you need managed DDOS protection?
AWS Shield
Which service would you use if you need store frequently used data for low latency access?
ElastiCache
What would you use to transfer files over long distances between a client and an S3 bucket?
Amazon S3 Transfer Acceleration
#### AWS Billing & Support
What is AWS Organizations?
AWS definition: "AWS Organizations helps you centrally govern your environment as you grow and scale your workloads on AWS." More on Organizations [here](https://aws.amazon.com/organizations)
Explain AWS pricing model
It mainly works on "pay-as-you-go" meaning you pay only for what are using and when you are using it. In s3 you pay for 1. How much data you are storing 2. Making requests (PUT, POST, ...) In EC2 it's based on the purchasing option (on-demand, spot, ...), instance type, AMI type and the region used. More on AWS pricing model [here](https://aws.amazon.com/pricing)
How one should estimate AWS costs when for example comparing to on-premise solutions?
* TCO calculator * AWS simple calculator * Cost Explorer
What basic support in AWS includes?
* 24x7 customer service * Trusted Advisor * AWS personal Health Dashoard
How are EC2 instances billed?
What AWS Pricing Calculator is used for?
What is Amazon Connect?
Amazon definition: "Amazon Connect is an easy to use omnichannel cloud contact center that helps companies provide superior customer service at a lower cost." Learn more [here](https://aws.amazon.com/connect)
What are "APN Consulting Partners"?
Amazon definition: "APN Consulting Partners are professional services firms that help customers of all types and sizes design, architect, build, migrate, and manage their workloads and applications on AWS, accelerating their journey to the cloud." Learn more [here](https://aws.amazon.com/partners/consulting)
Which of the following are AWS accounts types (and are sorted by order)? * Basic, Developer, Business, Enterprise * Newbie, Intermediate, Pro, Enterprise * Developer, Basic, Business, Enterprise * Beginner, Pro, Intermediate Enterprise
* Basic, Developer, Business, Enterprise
True or False? Region is a factor when it comes to EC2 costs/pricing
True. You pay differently based on the chosen region.
What is "AWS Infrastructure Event Management"?
AWS Definition: "AWS Infrastructure Event Management is a structured program available to Enterprise Support customers (and Business Support customers for an additional fee) that helps you plan for large-scale events such as product or application launches, infrastructure migrations, and marketing events."
#### AWS Automation
What is AWS CodeDeploy?
Amazon definition: "AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers." Learn more [here](https://aws.amazon.com/codedeploy)
Explain what is CloudFormation
#### AWS Misc
What is AWS Lightsail?
AWS definition: "Lightsail is an easy-to-use cloud platform that offers you everything needed to build an application or website, plus a cost-effective, monthly plan."
What is AWS Rekognition?
AWS definition: "Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use." Learn more [here](https://aws.amazon.com/rekognition)
What AWS Resource Groups used for?
Amazon definition: "You can use resource groups to organize your AWS resources. Resource groups make it easier to manage and automate tasks on large numbers of resources at one time. " Learn more [here](https://docs.aws.amazon.com/ARG/latest/userguide/welcome.html)
What is AWS Global Accelerator?
Amazon definition: "AWS Global Accelerator is a service that improves the availability and performance of your applications with local or global users..." Learn more [here](https://aws.amazon.com/global-accelerator)
What is AWS Config?
Amazon definition: "AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources." Learn more [here](https://aws.amazon.com/config)
What is AWS X-Ray?
AWS definition: "AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture." Learn more [here](https://aws.amazon.com/xray)
What is AWS OpsWorks?
Amazon definition: "AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet." Learn more about it [here](https://aws.amazon.com/opsworks)
What is AWS Service Catalog?
Amazon definition: "AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS." Learn more [here](https://aws.amazon.com/servicecatalog)
What is AWS CAF?
Amazon definition: "AWS Professional Services created the AWS Cloud Adoption Framework (AWS CAF) to help organizations design and travel an accelerated path to successful cloud adoption. " Learn more [here](https://aws.amazon.com/professional-services/CAF)
What is AWS Cloud9?
AWS definition: "AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser"
What is AWS Application Discovery Service?
Amazon definition: "AWS Application Discovery Service helps enterprise customers plan migration projects by gathering information about their on-premises data centers." Learn more [here](https://aws.amazon.com/application-discovery)
What is the Trusted Advisor?
What is the AWS well-architected framework and what pillars it's based on?
AWS definition: "The Well-Architected Framework has been developed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on five pillars — operational excellence, security, reliability, performance efficiency, and cost optimization" Learn more [here](https://aws.amazon.com/architecture/well-architected)
What AWS services are serverless (or have the option to be serverless)?
AWS Lambda AWS Athena
What is AWS EMR?
AWS definition: "big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto." Learn more [here](https://aws.amazon.com/emr)
What is AWS Athena?
"Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL." Learn more about AWS Athena [here](https://aws.amazon.com/athena)
What is Amazon Cloud Directory?
Amazon definition: "Amazon Cloud Directory is a highly available multi-tenant directory-based store in AWS. These directories scale automatically to hundreds of millions of objects as needed for applications." Learn more [here](https://docs.aws.amazon.com/clouddirectory/latest/developerguide/what_is_cloud_directory.html)
What is AWS Elastic Beanstalk?
AWS definition: "AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services...You can simply upload your code and Elastic Beanstalk automatically handles the deployment" Learn more about it [here](https://aws.amazon.com/elasticbeanstalk)
What is AWS SWF?
Amazon definition: "Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud." Learn more on Amazon Simple Workflow Service [here](https://aws.amazon.com/swf)
What is Simple Queue Service (SQS)?
AWS definition: "Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications". Learn more about it [here](https://aws.amazon.com/sqs)
#### AWS Disaster Recovery
In regards to disaster recovery, what is RTO and RPO?
RTO - The maximum acceptable length of time that your application can be offline. RPO - The maximum acceptable length of time during which data might be lost from your application due to an incident.
What types of disaster recovery techniques AWS supports?
* The Cold Method - Periodically backups and sending the backups off-site
* Pilot Light - Data is mirrored to an environment which is always running * Warm Standby - Running scaled down version of production environment * Multi-site - Duplicated environment that is always running
Which disaster recovery option has the highest downtime and which has the lowest?
Lowest - Multi-site Highest - The cold method
### Final Note Good luck! You can do it :) ================================================ FILE: certificates/aws-cloud-sysops-associate.md ================================================ ## AWS Cloud SysOps Administration Associate A summary of what you need to know for the exam can be found [here](https://aws.amazon.com/certification/certified-sysops-admin-associate) ### Who should take this exam?
AWS Certified SysOps Administrator - Associate is intended for system administrators in cloud operations roles to validate technical skills. Before you take this exam, we recommend you have :
* A minimum of one year of hands-on experience with AWS technology * Experience deploying, managing, and operating workloads on AWS as well as implementing security controls and compliance requirements * Familiarity with using both the AWS Management Console and the AWS Command Line Interface (CLI) * Understanding of the AWS Well-Architected Framework as well as AWS networking and security services ### Prepare for your exam
Get started with free resources or explore additional resources, including Official Practice Exams, with a subscription to AWS Skill Builder. * AWS Cloud SysOps Guide (SOA-C02) [here](https://d1.awsstatic.com/training-and-certification/docs-sysops-associate/AWS-Certified-SysOps-Administrator-Associate_Exam-Guide.pdf) * AWS Certified SysOps Administrator - Associate Official Practice Question Set (FREE) [here](https://explore.skillbuilder.aws/learn/course/external/view/elearning/12485/aws-certified-sysops-administrator-associate-practice-question-set-soa-c02-english?syops=sec&sec=prep) * Exam Prep: AWS Certified SysOps Administrator - Associate (FREE) [here](https://explore.skillbuilder.aws/learn/course/external/view/elearning/9313/exam-prep-aws-certified-sysops-administrator-associate) ### Certification resource
This is a resource for studying and preparing for the AWS Cloud SysOps Associate exam. * Architecting for the cloud [here](https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.pdf) * AWS Well-Architected Framework [here](https://d0.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf) * Development and Test on Amazon Web Services [here](https://media.amazonwebservices.com/AWS_Development_Test_Environments.pdf) * Backup, Archive, and Restore Approaches Using AWS [here](https://d0.awsstatic.com/whitepapers/Backup_Archive_and_Restore_Approaches_Using_AWS.pdf) * How AWS Pricing Works - AWS Pricing Overview [here](https://d0.awsstatic.com/whitepapers/aws_pricing_overview.pdf) * Sample Question AWS SOA-C02 [here](https://d1.awsstatic.com/training-and-certification/docs-sysops-associate/AWS-Certified-SysOps-Administrator-Associate_Sample-Questions.pdf) ================================================ FILE: certificates/aws-solutions-architect-associate.md ================================================ ## AWS - Solutions Architect Associate Last update: 2021 #### AWS Global Infrastructure
Explain the following * Availability zone * Region * Edge location
AWS regions are data centers hosted across different geographical locations worldwide, each region is completely independent of one another.
Within each region, there are multiple isolated locations known as Availability Zones. Multiple availability zones ensure high availability in case one of them goes down.
Edge locations are basically content delivery network which caches data and insures lower latency and faster delivery to the users in any location. They are located in major cities in the world.
#### AWS - IAM
What is IAM? What are some of its features?
Full explanation is [here](https://aws.amazon.com/iam) In short: it's used for managing users, groups, access policies & roles
True or False? IAM configuration is defined globally and not per region
True
True or False? When creating an AWS account, root account is created by default. This is the recommended account to use and share in your organization
False. Instead of using the root account, you should be creating users and use them.
True or False? Groups in AWS IAM, can contain only users and not other groups
True
True or False? Users in AWS IAM, can belong only to a single group
False. Users can belong to multiple groups.
What are Roles?
A way for allowing a service of AWS to use another service of AWS. You assign roles to AWS resources. For example, you can make use of a role which allows EC2 service to accesses s3 buckets (read and write).
What are Policies?
Policies documents used to give permissions as to what a user, group or role are able to do. Their format is JSON.
A user is unable to access an s3 bucket. What might be the problem?
There can be several reasons for that. One of them is lack of policy. To solve that, the admin has to attach the user with a policy what allows him to access the s3 bucket.
What should you use to: * Grant access between two services/resources? * Grant user access to resources/services?
* Role * Policy
What permissions does a new user have?
Only a login access.
#### AWS Networking
What is VPC?
"A logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define" Read more about it [here](https://aws.amazon.com/vpc).
True or False? VPC spans multiple regions
False
True or False? Subnets belong to the same VPC, can be in different availability zones
True. Just to clarify, a subnet must reside entirely in one AZ.
What is an Internet Gateway?
"component that allows communication between instances in your VPC and the internet" (AWS docs). Read more about it [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html)
True or False? NACL allow or deny traffic on the subnet level
True
True or False? Multiple Internet Gateways can be attached to one VPC
False. Only one internet gateway can be attached to a single VPC.
True or False? Route Tables used to allow or deny traffic from the internet to AWS instances
False.
Explain Security Groups and Network ACLs
* NACL - security layer on the subnet level. * Security Group - security layer on the instance level. Read more about it [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html) and [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html)
What is AWS Direct Connect?
Allows you to connect your corporate network to AWS network.
#### AWS Compute
What is EC2?
"a web service that provides secure, resizable compute capacity in the cloud". Read more [here](https://aws.amazon.com/ec2)
True or False? EC2 is a regional service
True. As opposed to IAM for example, which is a global service, EC2 is a regional service.
What is AMI?
Amazon Machine Images is "An Amazon Machine Image (AMI) provides the information required to launch an instance". Read more [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html)
What are the different source for AMIs?
* Personal AMIs - AMIs you create * AWS Marketplace for AMIs - Paid AMIs usually with bundled with licensed software * Community AMIs - Free
What is instance type?
"the instance type that you specify determines the hardware of the host computer used for your instance" Read more about instance types [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html)
True or False? The following are instance types available for a user in AWS: * Compute optimized * Network optimized * Web optimized
False. From the above list only compute optimized is available.
What is EBS?
"provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices." More on EBS [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html)
What EC2 pricing models are there?
On Demand - pay a fixed rate by the hour/second with no commitment. You can provision and terminate it at any given time. Reserved - you get capacity reservation, basically purchase an instance for a fixed time of period. The longer, the cheaper. Spot - Enables you to bid whatever price you want for instances or pay the spot price. Dedicated Hosts - physical EC2 server dedicated for your use.
What are Security Groups?
"A security group acts as a virtual firewall that controls the traffic for one or more instances" More on this subject [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html)
What can you attach to an EC2 instance in order to store data?
EBS
What EC2 RI types are there?
Standard RI - most significant discount + suited for steady-state usage Convertible RI - discount + change attribute of RI + suited for steady-state usage Scheduled RI - launch within time windows you reserve Learn more about EC2 RI [here](https://aws.amazon.com/ec2/pricing/reserved-instances)
#### AWS Containers
What is Amazon ECS?
Amazon definition: "Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service. Customers such as Duolingo, Samsung, GE, and Cook Pad use ECS to run their most sensitive and mission critical applications because of its security, reliability, and scalability." Learn more [here](https://aws.amazon.com/ecs)
What is Amazon ECR?
Amazon definition: "Amazon Elastic Container Registry (ECR) is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images." Learn more [here](https://aws.amazon.com/ecr)
What is AWS Fargate?
Amazon definition: "AWS Fargate is a serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS)." Learn more [here](https://aws.amazon.com/fargate)
#### AWS Storage
Explain what is AWS S3?
S3 stands for 3 S, Simple Storage Service. S3 is a object storage service which is fast, scalable and durable. S3 enables customers to upload, download or store any file or object that is up to 5 TB in size. More on S3 [here](https://aws.amazon.com/s3)
What is a bucket?
An S3 bucket is a resource which is similar to folders in a file system and allows storing objects, which consist of data.
True or False? A bucket name must be globally unique
True
Explain folders and objects in regards to buckets
* Folder - any sub folder in an s3 bucket * Object - The files which are stored in a bucket
Explain the following: * Object Lifecycles * Object Sharing * Object Versioning
* Object Lifecycles - Transfer objects between storage classes based on defined rules of time periods * Object Sharing - Share objects via a URL link * Object Versioning - Manage multiple versions of an object
Explain Object Durability and Object Availability
Object Durability: The percent over a one-year time period that a file will not be lost Object Availability: The percent over a one-year time period that a file will be accessible
What is a storage class? What storage classes are there?
Each object has a storage class assigned to, affecting its availability and durability. This also has effect on costs. Storage classes offered today: * Standard: * Used for general, all-purpose storage (mostly storage that needs to be accessed frequently) * The most expensive storage class * 11x9% durability * 2x9% availability * Default storage class * Standard-IA (Infrequent Access) * Long lived, infrequently accessed data but must be available the moment it's being accessed * 11x9% durability * 99.90% availability * One Zone-IA (Infrequent Access): * Long-lived, infrequently accessed, non-critical data * Less expensive than Standard and Standard-IA storage classes * 2x9% durability * 99.50% availability * Intelligent-Tiering: * Long-lived data with changing or unknown access patterns. Basically, In this class the data automatically moves to the class most suitable for you based on usage patterns * Price depends on the used class * 11x9% durability * 99.90% availability * Glacier: Archive data with retrieval time ranging from minutes to hours * Glacier Deep Archive: Archive data that rarely, if ever, needs to be accessed with retrieval times in hours * Both Glacier and Glacier Deep Archive are: * The most cheap storage classes * have 9x9% durability More on storage classes [here](https://aws.amazon.com/s3/storage-classes)
A customer would like to move data which is rarely accessed from standard storage class to the most cheapest class there is. Which storage class should be used? * One Zone-IA * Glacier Deep Archive * Intelligent-Tiering
Glacier Deep Archive
What Glacier retrieval options are available for the user?
Expedited, Standard and Bulk
True or False? Each AWS account can store up to 500 PetaByte of data. Any additional storage will cost double
False. Unlimited capacity.
Explain what is Storage Gateway
"AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage". More on Storage Gateway [here](https://aws.amazon.com/storagegateway)
Explain the following Storage Gateway deployments types * File Gateway * Volume Gateway * Tape Gateway
Explained in detail [here](https://aws.amazon.com/storagegateway/faqs)
What is the difference between stored volumes and cached volumes?
Stored Volumes - Data is located at customer's data center and periodically backed up to AWS Cached Volumes - Data is stored in AWS cloud and cached at customer's data center for quick access
What is "Amazon S3 Transfer Acceleration"?
AWS definition: "Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket" Learn more [here](https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html)
What is Amazon EFS?
Amazon definition: "Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources." Learn more [here](https://aws.amazon.com/efs)
What is AWS Snowmobile?
"AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS." Learn more [here](https://aws.amazon.com/snowmobile)
##### AWS ELB
What is ELB (Elastic Load Balancing)?
AWS definition: "Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions." More on ELB [here](https://aws.amazon.com/elasticloadbalancing)
What is auto scaling?
AWS definition: "AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost" Read more about auto scaling [here](https://aws.amazon.com/autoscaling)
True or False? Auto Scaling is about adding resources (such as instances) and not about removing resource
False. Auto scaling adjusts capacity and this can mean removing some resources based on usage and performances.
What types of load balancers are supported in EC2 and what are they used for?
* Application LB - layer 7 traffic * Network LB - ultra-high performances or static IP address * Classic LB - low costs, good for test or dev environments
#### AWS DNS
What is Route 53?
"Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service" Some of Route 53 features: * Register domain * DNS service - domain name translations * Health checks - verify your app is available More on Route 53 [here](https://aws.amazon.com/route53)
#### AWS CloudFront
Explain what is CloudFront
AWS definition: "Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment." More on CloudFront [here](https://aws.amazon.com/cloudfront)
Explain the following * Origin * Edge location * Distribution
#### AWS Monitoring & Logging
What is AWS CloudWatch?
AWS definition: "Amazon CloudWatch is a monitoring and observability service..." More on CloudWatch [here](https://aws.amazon.com/cloudwatch)
What is AWS CloudTrail?
AWS definition: "AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account." Read more on CloudTrail [here](https://aws.amazon.com/cloudtrail)
What is Simply Notification Service?
AWS definition: "a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications." Read more about it [here](https://aws.amazon.com/sns)
Explain the following in regards to SNS: * Topics * Subscribers * Publishers
* Topics - used for grouping multiple endpoints * Subscribers - the endpoints where topics send messages to * Publishers - the provider of the message (event, person, ...)
#### AWS Security
What is the shared responsibility model? What AWS is responsible for and what the user is responsible for based on the shared responsibility model?
The shared responsibility model defines what the customer is responsible for and what AWS is responsible for. More on the shared responsibility model [here](https://aws.amazon.com/compliance/shared-responsibility-model)
True or False? Based on the shared responsibility model, Amazon is responsible for physical CPUs and security groups on instances
False. It is responsible for Hardware in its sites but not for security groups which created and managed by the users.
Explain "Shared Controls" in regards to the shared responsibility model
AWS definition: "apply to both the infrastructure layer and customer layers, but in completely separate contexts or perspectives. In a shared control, AWS provides the requirements for the infrastructure and the customer must provide their own control implementation within their use of AWS services" Learn more about it [here](https://aws.amazon.com/compliance/shared-responsibility-model)
What is the AWS compliance program?
What is AWS Artifact?
AWS definition: "AWS Artifact is your go-to, central resource for compliance-related information that matters to you. It provides on-demand access to AWS’ security and compliance reports and select online agreements." Read more about it [here](https://aws.amazon.com/artifact)
What is AWS Inspector?
AWS definition: "Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices."" Learn more [here](https://aws.amazon.com/inspector)
What is AWS Guarduty?
What is AWS Shield?
AWS definition: "AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS."
What is AWS WAF? Give an example of how it can used and describe what resources or services you can use it with
What AWS VPN is used for?
What is the difference between Site-to-Site VPN and Client VPN?
What is AWS CloudHSM?
Amazon definition: "AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud." Learn more [here](https://aws.amazon.com/cloudhsm)
True or False? AWS Inspector can perform both network and host assessments
True
What is AWS Acceptable Use Policy?
It describes prohibited uses of the web services offered by AWS. More on AWS Acceptable Use Policy [here](https://aws.amazon.com/aup)
What is AWS Key Management Service (KMS)?
AWS definition: "KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications." More on KMS [here](https://aws.amazon.com/kms)
True or False? A user is not allowed to perform penetration testing on any of the AWS services
False. On some services, like EC2, CloudFront and RDS, penetration testing is allowed.
True or False? DDoS attack is an example of allowed penetration testing activity
False.
True or False? AWS Access Key is a type of MFA device used for AWS resources protection
False. Security key is an example of an MFA device.
What is Amazon Cognito?
Amazon definition: "Amazon Cognito handles user authentication and authorization for your web and mobile apps." Learn more [here](https://docs.aws.amazon.com/cognito/index.html)
What is AWS ACM?
Amazon definition: "AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources." Learn more [here](https://aws.amazon.com/certificate-manager)
#### AWS Databases
What is AWS RDS?
What is AWS DynamoDB?
Explain "Point-in-Time Recovery" feature in DynamoDB
Amazon definition: "You can create on-demand backups of your Amazon DynamoDB tables, or you can enable continuous backups using point-in-time recovery. For more information about on-demand backups, see On-Demand Backup and Restore for DynamoDB." Learn more [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html)
Explain "Global Tables" in DynamoDB
Amazon definition: "A global table is a collection of one or more replica tables, all owned by a single AWS account." Learn more [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_HowItWorks.html)
What is DynamoDB Accelerator?
Amazon definition: "Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds..." Learn more [here](https://aws.amazon.com/dynamodb/dax)
What is AWS Redshift and how is it different than RDS?
cloud data warehouse
What is AWS ElastiCache? For what cases is it used?
Amazon Elasticache is a fully managed Redis or Memcached in-memory data store. It's great for use cases like two-tier web applications where the most frequently accesses data is stored in ElastiCache so response time is optimal.
What is Amazon Aurora
A MySQL & Postgresql based relational database. Also, the default database proposed for the user when using RDS for creating a database. Great for use cases like two-tier web applications that has a MySQL or Postgresql database layer and you need automated backups for your application.
What is Amazon DocumentDB?
Amazon definition: "Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data." Learn more [here](https://aws.amazon.com/documentdb)
What "AWS Database Migration Service" is used for?
What type of storage is used by Amazon RDS?
EBS
Explain Amazon RDS Read Replicas
AWS definition: "Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances. They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads." Read more about [here](https://aws.amazon.com/rds/features/read-replicas)
#### AWS Serverless Compute
Explain what is AWS Lambda
AWS definition: "AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume." Read more on it [here](https://aws.amazon.com/lambda)
True or False? In AWS Lambda, you are charged as long as a function exists, regardless of whether it's running or not
False. Charges are being made when the code is executed.
Which of the following set of languages Lambda supports? * R, Swift, Rust, Kotlin * Python, Ruby, Go * Python, Ruby, PHP
* Python, Ruby, Go
#### Identify the service or tool
What would you use for automating code/software deployments?
AWS CodeDeploy
What would you use for easily creating similar AWS environments/resources for different customers?
CloudFormation
Which service would you use for building a website or web application?
Lightsail
Which tool would you use for choosing between Reserved instances or On-Demand instances?
Cost Explorer
What would you use to check how many unassociated Elastic IP address you have?
Trusted Advisor
What service allows you to transfer large amounts (Petabytes) of data in and out of the AWS cloud?
AWS Snowball
What provides a virtual network dedicated to your AWS account?
VPC
What you would use for having automated backups for an application that has MySQL database layer?
Amazon Aurora
What would you use to migrate on-premise database to AWS?
AWS Database Migration Service (DMS)
What would you use to check why certain EC2 instances were terminated?
AWS CloudTrail
What would you use for SQL database?
AWS RDS
What would you use for NoSQL database?
AWS DynamoDB
What would you use for running SQL queries interactively on S3?
AWS Athena
What would you use for adding image and video analysis to your application?
AWS Rekognition
Which service would you use for debugging and improving performances issues with your applications?
AWS X-Ray
Which service is used for sending notifications?
SNS
Which service would you use for monitoring malicious activity and unauthorized behavior in regards to AWS accounts and workloads?
Amazon GuardDuty
Which service would you use for centrally manage billing, control access, compliance, and security across multiple AWS accounts?
AWS Organizations
Which service would you use for web application protection?
AWS WAF
You would like to monitor some of your resources in the different services. Which service would you use for that?
CloudWatch
Which service would you use for performing security assessment?
AWS Inspector
Which service would you use for creating DNS record?
Route 53
What would you use if you need a fully managed document database?
Amazon DocumentDB
Which service would you use to add access control (or sign-up, sign-in forms) to your web/mobile apps?
AWS Cognito
Which service would you use if you need messaging queue?
Simple Queue Service (SQS)
Which service would you use if you need managed DDOS protection?
AWS Shield
Which service would you use if you need store frequently used data for low latency access?
ElastiCache
What would you use to transfer files over long distances between a client and an S3 bucket?
Amazon S3 Transfer Acceleration
#### AWS Billing & Support
What is AWS Organizations?
AWS definition: "AWS Organizations helps you centrally govern your environment as you grow and scale your workloads on AWS." More on Organizations [here](https://aws.amazon.com/organizations)
Explain AWS pricing model
It mainly works on "pay-as-you-go" meaning you pay only for what are using and when you are using it. In s3 you pay for 1. How much data you are storing 2. Making requests (PUT, POST, ...) In EC2 it's based on the purchasing option (on-demand, spot, ...), instance type, AMI type and the region used. More on AWS pricing model [here](https://aws.amazon.com/pricing)
How one should estimate AWS costs when for example comparing to on-premise solutions?
* TCO calculator * AWS simple calculator * Cost Explorer
What basic support in AWS includes?
* 24x7 customer service * Trusted Advisor * AWS personal Health Dashoard
How are EC2 instances billed?
What AWS Pricing Calculator is used for?
What is Amazon Connect?
Amazon definition: "Amazon Connect is an easy to use omnichannel cloud contact center that helps companies provide superior customer service at a lower cost." Learn more [here](https://aws.amazon.com/connect)
What are "APN Consulting Partners"?
Amazon definition: "APN Consulting Partners are professional services firms that help customers of all types and sizes design, architect, build, migrate, and manage their workloads and applications on AWS, accelerating their journey to the cloud." Learn more [here](https://aws.amazon.com/partners/consulting)
Which of the following are AWS accounts types (and are sorted by order)? * Basic, Developer, Business, Enterprise * Newbie, Intermediate, Pro, Enterprise * Developer, Basic, Business, Enterprise * Beginner, Pro, Intermediate Enterprise
* Basic, Developer, Business, Enterprise
True or False? Region is a factor when it comes to EC2 costs/pricing
True. You pay differently based on the chosen region.
What is "AWS Infrastructure Event Management"?
AWS Definition: "AWS Infrastructure Event Management is a structured program available to Enterprise Support customers (and Business Support customers for an additional fee) that helps you plan for large-scale events such as product or application launches, infrastructure migrations, and marketing events."
#### AWS Automation
What is AWS CodeDeploy?
Amazon definition: "AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers." Learn more [here](https://aws.amazon.com/codedeploy)
Explain what is CloudFormation
#### AWS Misc
What is AWS Lightsail?
AWS definition: "Lightsail is an easy-to-use cloud platform that offers you everything needed to build an application or website, plus a cost-effective, monthly plan."
What is AWS Rekognition?
AWS definition: "Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use." Learn more [here](https://aws.amazon.com/rekognition)
What AWS Resource Groups used for?
Amazon definition: "You can use resource groups to organize your AWS resources. Resource groups make it easier to manage and automate tasks on large numbers of resources at one time. " Learn more [here](https://docs.aws.amazon.com/ARG/latest/userguide/welcome.html)
What is AWS Global Accelerator?
Amazon definition: "AWS Global Accelerator is a service that improves the availability and performance of your applications with local or global users..." Learn more [here](https://aws.amazon.com/global-accelerator)
What is AWS Config?
Amazon definition: "AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources." Learn more [here](https://aws.amazon.com/config)
What is AWS X-Ray?
AWS definition: "AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture." Learn more [here](https://aws.amazon.com/xray)
What is AWS OpsWorks?
Amazon definition: "AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet." Learn more about it [here](https://aws.amazon.com/opsworks)
What is AWS Service Catalog?
Amazon definition: "AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS." Learn more [here](https://aws.amazon.com/servicecatalog)
What is AWS CAF?
Amazon definition: "AWS Professional Services created the AWS Cloud Adoption Framework (AWS CAF) to help organizations design and travel an accelerated path to successful cloud adoption. " Learn more [here](https://aws.amazon.com/professional-services/CAF)
What is AWS Cloud9?
AWS definition: "AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser"
What is AWS Application Discovery Service?
Amazon definition: "AWS Application Discovery Service helps enterprise customers plan migration projects by gathering information about their on-premises data centers." Learn more [here](https://aws.amazon.com/application-discovery)
What is the Trusted Advisor?
What is the AWS well-architected framework and what pillars it's based on?
AWS definition: "The Well-Architected Framework has been developed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on five pillars — operational excellence, security, reliability, performance efficiency, and cost optimization" Learn more [here](https://aws.amazon.com/architecture/well-architected)
What AWS services are serverless (or have the option to be serverless)?
AWS Lambda AWS Athena
What is AWS EMR?
AWS definition: "big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto." Learn more [here](https://aws.amazon.com/emr)
What is AWS Athena?
"Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL." Learn more about AWS Athena [here](https://aws.amazon.com/athena)
What is Amazon Cloud Directory?
Amazon definition: "Amazon Cloud Directory is a highly available multi-tenant directory-based store in AWS. These directories scale automatically to hundreds of millions of objects as needed for applications." Learn more [here](https://docs.aws.amazon.com/clouddirectory/latest/developerguide/what_is_cloud_directory.html)
What is AWS Elastic Beanstalk?
AWS definition: "AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services...You can simply upload your code and Elastic Beanstalk automatically handles the deployment" Learn more about it [here](https://aws.amazon.com/elasticbeanstalk)
What is AWS SWF?
Amazon definition: "Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud." Learn more on Amazon Simple Workflow Service [here](https://aws.amazon.com/swf)
What is Simple Queue Service (SQS)?
AWS definition: "Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications". Learn more about it [here](https://aws.amazon.com/sqs)
#### AWS Disaster Recovery
In regards to disaster recovery, what is RTO and RPO?
RTO - The maximum acceptable length of time that your application can be offline. RPO - The maximum acceptable length of time during which data might be lost from your application due to an incident.
What types of disaster recovery techniques AWS supports?
* The Cold Method - Periodically backups and sending the backups off-site
* Pilot Light - Data is mirrored to an environment which is always running * Warm Standby - Running scaled down version of production environment * Multi-site - Duplicated environment that is always running
Which disaster recovery option has the highest downtime and which has the lowest?
Lowest - Multi-site Highest - The cold method
### Final Note Good luck! You can do it :) ================================================ FILE: certificates/azure-fundamentals-az-900.md ================================================ ## AZ-900
What is cloud computing?
[Wikipedia](https://en.wikipedia.org/wiki/Cloud_computing): "Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user"
What types of clouds (or cloud deployments) are there?
* Public - Cloud services sharing computing resources among multiple customers * Private - Cloud services having computing resources limited to specific customer or organization, managed by third party or organizations itself * Hybrid - Combination of public and private clouds
What is Azure Firewall?
Azure Firewall is a cloud-native and intelligent network firewall security service that provides the best of breed threat protection for your cloud workloads running in Azure.
What is Network Security Group?
A network security group contains security rules that allow or deny inbound network traffic to, or outbound network traffic from, several types of Azure resources. For each rule, you can specify source and destination, port, and protocol.
================================================ FILE: certificates/cka.md ================================================ ## Certified Kubernetes Administrator (CKA) ### Pods
Deploy a pod called web-1985 using the nginx:alpine image
`kubectl run web-1985 --image=nginx:alpine --restart=Never`
How to find out on which node a certain pod is running?
`kubectl get po -o wide`
================================================ FILE: certificates/ckad.md ================================================ ## Certified Kubernetes Application Developer (CKAD) ### Core Concepts ### Pods
Deploy a pod called web-1985 using the nginx:alpine image
`kubectl run web-1985 --image=nginx:alpine --restart=Never`
How to find out on which node a certain pod is running?
`kubectl get po -o wide`
### Namespaces
List all namespaces
kubectl get ns
List all the pods in the namespace 'neverland'
kubectl get po -n neverland
List all the pods in all the namespaces
kubectl get po --all-namespaces
================================================ FILE: coding/python/binary_search.py ================================================ #!/usr/bin/env python import random from typing import List, Optional def binary_search(arr: List[int], lb: int, ub: int, target: int) -> Optional[int]: """ A Binary Search Example which has O(log n) time complexity. """ while lb <= ub: mid = lb + (ub - lb) // 2 if arr[mid] == target: return mid elif arr[mid] < target: lb = mid + 1 else: ub = mid - 1 return -1 def generate_random_list(size: int = 10, lower: int = 1, upper: int = 50) -> List[int]: return sorted(random.randint(lower, upper) for _ in range(size)) def find_target_in_list(target: int, lst: List[int]) -> int: return binary_search(lst, 0, len(lst) - 1, target) def main(): """ Executes the binary search algorithm with a randomly generated list. Time Complexity: O(log n) """ rand_num_li = generate_random_list() target = random.randint(1, 50) index = find_target_in_list(target, rand_num_li) print(f"List: {rand_num_li}\nTarget: {target}\nIndex: {index}") if __name__ == '__main__': main() ================================================ FILE: coding/python/merge_sort.py ================================================ #!/usr/bin/env python import random from typing import List def merge_sort(arr: List[int]) -> List[int]: if len(arr) <= 1: return arr mid = len(arr) // 2 left = merge_sort(arr[:mid]) right = merge_sort(arr[mid:]) return merge(left, right) def merge(left: List[int], right: List[int]) -> List[int]: merged = [] i = j = 0 while i < len(left) and j < len(right): if left[i] <= right[j]: merged.append(left[i]) i += 1 else: merged.append(right[j]) j += 1 merged.extend(left[i:]) merged.extend(right[j:]) return merged def generate_random_list(size: int = 10, lower: int = 1, upper: int = 100) -> List[int]: return [random.randint(lower, upper) for _ in range(size)] def main(): """ Executes the merge sort algorithm with a randomly generated list. Time Complexity: O(n log n) """ rand_num_li = generate_random_list() print(f"Unsorted List: {rand_num_li}") sorted_list = merge_sort(rand_num_li) print(f"Sorted List: {sorted_list}") if __name__ == '__main__': main() ================================================ FILE: credits-pt-BR.md ================================================ ## Créditos Logo do Jenkins criado por Ksenia Nenasheva e publicado através de jenkins.io está licenciado sob cc by-sa 3.0
Logo do Git por Jason Long está licenciado sob a Creative Commons Attribution 3.0 Unported License
Logo do Terraform criado por Hashicorp®
Logo do Docker criado por Docker®
O logo do Python é uma marca registrada da Python Software Foundation®
Logo do Puppet criado por Puppet®
Logo do Bash criado por Prospect One
Logo do OpenStack criado por e uma marca registrada da The OpenStack Foundation®
Logos do Linux, Kubernetes e Prometheus são marcas registradas da The Linux Foundation®
Logo do Mongo é uma marca registrada da Mongo®
Logo distribuído por Flatart
Ícone de desafio por Elizabeth Arostegui em Technology Mix
Ícones "Pergunta que você faz" (homem levantando a mão) e "Banco de dados" por [Webalys](https://www.iconfinder.com/webalys) Logo de teste por [Flatart](https://www.iconfinder.com/Flatart)
Logo da Google Cloud Platform criado por Google®
Logo do VirtualBox criado por dAKirby309, sob a Creative Commons Attribution-Noncommercial 4.0 License.
Logo de certificados por Flatart
Ícone de armazenamento por Dinosoftlab
Ícone de CI/CD feito por Freepik de www.flaticon.com
Logo de Engenharia do Caos feito por Arie Bregman ================================================ FILE: credits.md ================================================ ## Credits Jenkins logo created by Ksenia Nenasheva and published through jenkins.io is licensed under cc by-sa 3.0
Git Logo by Jason Long is licensed under the Creative Commons Attribution 3.0 Unported License
Terraform logo created by Hashicorp®
Docker logo created by Docker®
The Python logo is a trademark of the Python Software Foundation®
Puppet logo created by Puppet®
Bash logo created by Prospect One
OpenStack logo created by and a trademark of The OpenStack Foundation®
Linux, Kubernetes and Prometheus logos are trademarks of The Linux Foundation®
Mongo logo is a trademark of Mongo®
Distributed logo by Flatart
Challenge icon by Elizabeth Arostegui in Technology Mix
"Question you ask" (man raising hand) and "Database" icons by [Webalys](https://www.iconfinder.com/webalys) Testing logo by [Flatart](https://www.iconfinder.com/Flatart)
Google Cloud Plataform Logo created by Google®
VirtualBox Logo created by dAKirby309, under the Creative Commons Attribution-Noncommercial 4.0 License.
Certificates logo by Flatart
Storage icon by Dinosoftlab
CI/CD icon made made by Freepik from www.flaticon.com
Chaos Engineering logo made by Arie Bregman ================================================ FILE: exercises/docker/docker-debugging.md ================================================ # Docker Scenario-Based Exercises This file contains scenario-based Docker questions to help DevOps engineers practice real-world troubleshooting and configuration tasks. Each question simulates a practical scenario with a step-by-step answer. ## Question 1: Debugging a Docker Container Failure ### Question You’re a DevOps engineer deploying a Node.js application using Docker. You run `docker run -d -p 3000:3000 my-node-app`, but the container exits immediately. Using `docker ps -a`, you see the container status as `Exited`. How would you troubleshoot and resolve this issue? ### Answer To troubleshoot a container exiting immediately: 1. **Check Logs**: Run `docker logs my-node-app` to view error messages. Common issues include missing dependencies (e.g., `npm install` failed) or an incorrect command. 2. **Inspect the Container**: Use `docker inspect my-node-app` to check `Config.Cmd` or `Config.Entrypoint`. Ensure the command (e.g., `node app.js`) is valid. 3. **Verify the Dockerfile**: Check if `CMD` or `ENTRYPOINT` is correct, e.g., `CMD ["node", "app.js"]`. Update and rebuild if needed: `docker build -t my-node-app .`. 4. **Test Interactively**: Run `docker run -it my-node-app sh` to debug manually (e.g., `node app.js`). 5. **Check Resources**: Ensure the host has enough memory/CPU using `docker stats`. **Example Fix**: If logs show `node: command not found`, update the Dockerfile to use `FROM node:18`, rebuild, and rerun. ### Additional Notes - Always start with `docker logs` for error clues. - Use `docker ps -a` to check container status and ID. - Common issues include missing dependencies or crashing apps. --- ## Question 2: Configuring a Multi-Container Application ### Question As a DevOps engineer, you need to deploy a web application with a Node.js backend and a MySQL database using Docker. The Node.js app connects to MySQL on `localhost:3306`, but running `docker run` for each container separately fails because they can’t communicate. How would you set up these containers to work together? ### Answer To make the Node.js and MySQL containers communicate: 1. **Use Docker Compose**: Create a `docker-compose.yml` file to define and link both services: ```yaml version: '3.8' services: node-app: image: my-node-app build: . ports: - "3000:3000" depends_on: - mysql-db environment: - DB_HOST=mysql-db - DB_PORT=3306 mysql-db: image: mysql:8.0 environment: - MYSQL_ROOT_PASSWORD=secret ports: - "3306:3306" ``` 2. **Run the Application**: Execute `docker-compose up -d` to start both containers. The `node-app` service connects to `mysql-db` using the service name (`mysql-db`) as the hostname, not `localhost`. 3. **Verify Connectivity**: Check logs with `docker-compose logs node-app` to ensure the Node.js app connects to MySQL. If it fails, verify the environment variables and MySQL’s readiness. 4. **Alternative Without Compose**: Use a custom network: - Create a network: `docker network create my-app-network` - Run MySQL: `docker run -d --name mysql-db --network my-app-network -e MYSQL_ROOT_PASSWORD=secret mysql:8.0` - Run Node.js: `docker run -d --name node-app --network my-app-network -p 3000:3000 -e DB_HOST=mysql-db my-node-app` ### Additional Notes - Docker Compose simplifies multi-container setups by managing networks and dependencies. - Always set environment variables for database credentials to avoid hardcoding. --- ## Question 3: Optimizing a Dockerfile for CI/CD ### Question You’re a DevOps engineer integrating a Dockerized Python application into a Jenkins CI/CD pipeline. The Dockerfile builds slowly, causing pipeline delays. How would you optimize the Dockerfile to speed up builds while maintaining functionality? ### Answer To optimize a Dockerfile for faster CI/CD builds: 1. **Use a Smaller Base Image**: Replace heavy images like `python:3.9` with `python:3.9-slim` to reduce size and download time. ```dockerfile FROM python:3.9-slim ``` 2. **Leverage Layer Caching**: Order instructions from least to most likely to change. Copy `requirements.txt` and install dependencies before copying the app code: ```dockerfile COPY requirements.txt . RUN pip install -r requirements.txt COPY . . ``` 3. **Minimize Layers**: Combine related commands with `&&` to reduce layers: ```dockerfile RUN pip install -r requirements.txt && rm -rf /root/.cache/pip ``` 4. **Use Multi-Stage Builds**: If the app needs build tools, use a multi-stage build to keep the final image small: ```dockerfile FROM python:3.9 AS builder COPY requirements.txt . RUN pip install -r requirements.txt FROM python:3.9-slim COPY --from=builder /usr/local/lib/python3.9 /usr/local/lib/python3.9 COPY . . CMD ["python", "app.py"] ``` 5. **Test in Jenkins**: Update the Jenkins pipeline to rebuild the image only when `Dockerfile` or code changes, using a cached image otherwise: ```groovy pipeline { agent any stages { stage('Build Docker Image') { when { changeset "Dockerfile,**.py" } steps { sh 'docker build -t my-python-app .' } } } } ``` ### Additional Notes - Use `.dockerignore` to exclude unnecessary files (e.g., `.git`, `tests/`) from the build context. - Monitor build times in Jenkins to confirm improvements. *Contributed by Lahiru Galhena* ================================================ FILE: exercises/shell/solutions/directories_comparision.md ================================================ ## How to compare two directories in Linux? You can use the 'diff' command with the '-r' flag to compare two direcotries recursively. ### Example: '''bash diff -r folder1/ folder2/ This command compares all the files and subdirectories inside 'folder1' and 'folder2'. If both directories have identical contents, it retuns nothing. If there are differences,it showss which files differ or are missing. ================================================ FILE: exercises/shell/solutions/directories_comparison.md ================================================ ## Directories Comparison ### Objectives 1. You are given two directories as arguments and the output should be any difference between the two directories ### Solution ``` #!/usr/bin/env bash help () { echo "Usage: compare " echo } validate_args() { # Ensure that 2 arguments are passed if [ $# != 2 ] then help exit 1 fi i=1 for dir in "$@" do # Validate existence of directories if [ ! -d "$dir" ] then echo "Directory $dir does not exist" exit 1 fi echo "Directory $i: $dir" i=$((i + 1)) done echo } compare() { echo "Comparing directories..." echo diff -r "$1" "$2" if [ $? -eq 0 ] then echo "No difference" fi exit 0 } while getopts ":h" option; do case $option in h) # display Help help exit 0;; \?) # invalid option echo "Error: Invalid option" exit 1;; esac done validate_args "$@" compare "$1" "$2" ``` ================================================ FILE: faq-pt-BR.md ================================================ ## FAQ Perguntas mais frequentes. ### Qual é o propósito do repositório? Aprender, é claro. ### Meu objetivo é me preparar para entrevistas de DevOps. Devo usar este repositório? No geral, este repositório deve ajudá-lo a aprender alguns conceitos, mas não presuma em nenhum momento que sua entrevista incluirá perguntas semelhantes às incluídas neste repositório. Em relação às entrevistas, adicionei algumas sugestões [aqui](prepare_for_interview.md)
### Você vai parar em algum momento de adicionar perguntas e exercícios? Tudo o que é bom chega ao fim... ### Como me torno um Engenheiro de DevOps melhor? Essa é uma ótima pergunta.
Não tenho uma resposta definitiva para esta pergunta, eu mesmo a exploro de tempos em tempos. O que acredito que ajuda é: * Praticar - Praticar DevOps na prática deve ser a principal maneira de se tornar um engenheiro de DevOps, na minha opinião * Ler - blogs, livros, ... qualquer coisa que possa enriquecer seu conhecimento sobre DevOps ou tópicos relacionados a DevOps * Participar - existem ótimas comunidades de DevOps. Eu pessoalmente gosto da [comunidade DevOps do Reddit](https://www.reddit.com/r/devops). Visitando lá, aprendo muito sobre diferentes tópicos. * Compartilhar - Esta é uma das razões pelas quais criei este projeto. O objetivo principal era ajudar os outros, mas um objetivo secundário rapidamente se tornou aprender mais. Ao fazer perguntas, você realmente aprende melhor um determinado tópico. Experimente, pegue um determinado assunto e tente criar perguntas que você faria a alguém para testar suas habilidades sobre esse tópico. ### Por que a maioria das perguntas não tem respostas? 1. Porque precisamos de mais contribuidores 2. Porque muitas vezes fazer perguntas é mais fácil do que respondê-las ### Onde posso encontrar respostas para algumas das perguntas neste repositório? 1. Procure por elas usando motores de busca, páginas de documentação, ... isso faz parte de ser um engenheiro de DevOps 2. Use as comunidades: muitas pessoas ficarão felizes em ajudar e responder às suas perguntas 3. Pergunte-nos. Se quiser, pode entrar em contato comigo ou iniciar uma discussão sobre este projeto. ### De onde vêm as perguntas e respostas? Bem, de todos os lugares! - experiência passada, colegas, contribuidores, ... mas por favor, note que não permitimos copiar perguntas de entrevista de sites de perguntas de entrevista para cá. Há pessoas que trabalharam duro para adicioná-las aos seus sites e nós respeitamos isso.
Como evidência, nós negamos pull requests com conteúdo copiado de outros sites. ### Quais são as principais habilidades de DevOps necessárias para ser um Engenheiro de DevOps? É uma pergunta difícil e a razão é que se você perguntar a 20 pessoas diferentes, provavelmente obterá pelo menos 10 respostas diferentes, mas aqui está o que acredito ser comum hoje: * SO - DevOps exige que você tenha um bom entendimento dos conceitos do sistema operacional. O nível exigido depende principalmente da empresa, embora na minha opinião deva ser o mesmo nível. Você deve entender como o sistema operacional funciona, como solucionar problemas e depurar, etc. * Programação faz parte do DevOps. O nível novamente depende da empresa. Alguns exigirão que você saiba um nível básico de scripting, enquanto outros exigirão um profundo entendimento de algoritmos comuns, estrutura de dados, padrões de design, etc. * Nuvem e Contêineres - embora não seja 100% obrigatório em todas as empresas/posições, essa habilidade está em ascensão a cada ano e muitas (se não a maioria) das posições/empresas exigem essa habilidade. Isso significa especificamente: AWS/Azure/GCP, Docker/Podman, Kubernetes, ... * CI/CD - Ser capaz de responder a perguntas como "Por que precisamos de CI/CD?" e "Quais maneiras e modelos existem para realizar CI/CD?". Eventualmente, pratique a montagem de tais processos e fluxos de trabalho, usando quaisquer ferramentas com as quais você se sinta confortável. ### Sinto que há algumas perguntas que не deveriam ser incluídas neste projeto Isso é uma pergunta? :)
Se você não gosta de algumas das perguntas ou acha que algumas perguntas devem ser removidas, você pode abrir um issue ou enviar um PR e podemos discutir lá. Não temos regras contra a exclusão de perguntas (por enquanto :P) ### Posso copiar as perguntas daqui para o meu site? Você pode (embora eu não tenha ideia do porquê você iria querer), mas: * Não sem atribuição. Muitas pessoas trabalharam duro para adicionar essas perguntas e elas merecem o devido crédito por seu trabalho * Não se você planeja ganhar dinheiro com isso. Direta ou indiretamente (por exemplo, ADS), pois este é um conteúdo gratuito e gostaríamos que permanecesse assim :) O mesmo vale para copiar perguntas de diferentes fontes para este repositório. Vimos isso acontecer já com alguns pull requests e os rejeitamos. Não mesclaremos pull requests com perguntas e respostas copiadas de outras fontes. ### Posso adicionar perguntas e/ou respostas a este projeto? Vou simplesmente imaginar que você não perguntou isso em um projeto de código aberto... :) ### Por que não posso adicionar perguntas de instalação? Em geral, prefiro que as perguntas adicionadas a este repositório tenham certo valor educacional para o usuário. Seja em relação a um determinado conceito ou mesmo uma pergunta muito geral, mas que fará o usuário pesquisar sobre um determinado tópico e o tornará eventualmente mais familiarizado com alguns de seus conceitos centrais.
Sei que este não é o caso para todas as perguntas neste repositório até hoje (por exemplo, perguntas sobre comandos específicos), mas isso é definitivamente algo a se aspirar. Vejo pouco ou nenhum valor no que é conhecido como "Perguntas de Instalação". Digamos que eu lhe pergunte "como instalar o Jenkins?". Devo concluir da sua resposta que você está familiarizado com o que é o Jenkins e/ou como ele funciona? Em outras palavras, há valor em saber como instalar o Jenkins? Na minha opinião, não. ### Onde posso praticar codificação? Pessoalmente, gosto muito dos seguintes sites * [HackerRank](https://www.hackerrank.com) * [LeetCode](https://leetcode.com) * [Exercism](https://exercism.io) ### Como aprender mais sobre DevOps? Listei alguns roteiros em [devops-resources](https://github.com/bregman-arie/devops-resources) ### Por que algumas perguntas se repetem? Se você vir duas perguntas idênticas, isso é um bug.
Se você vir duas perguntas semelhantes, isso é um recurso :D (= é intencional) Por exemplo: 1. O que é escalonamento horizontal? 2. O ato de adicionar instâncias adicionais ao pool para lidar com o escalonamento é chamado de escalonamento ________ Você está certo, ambos perguntam sobre escalonamento horizontal, mas é feito de um ângulo diferente em cada pergunta e, além disso, acredito que a repetição ajuda você a aprender algo de uma forma que você не fica fixo na maneira como é perguntado, mas sim entende o conceito em si. ### Vocês estão abertos a fazer grandes mudanças no repositório? Absolutamente. Não tenha medo de levantar ideias e iniciar discussões.
Ficarei mais do que feliz em discutir qualquer mudança que você ache que devemos fazer para melhorar a experiência de aprendizado ================================================ FILE: faq.md ================================================ ## FAQ Most frequently asked questions. ### What is the purpose of repository? Learning, of course. ### My goal is to prepare for a DevOps interviews. Should I use this repository? Overall, this repository should help you learn some concepts but, don't assume at any point that your interview will include similar questions to those that included in this repository. Regarding interviews, I've added a couple of suggestions [here](prepare_for_interview.md)
### Will you stop at some point adding questions and exercises? All good things come to an end... ### How do I become a better DevOps Engineer? That's a great question.
I don't have a definitive answer for this question, I'm exploring it myself from time to time. What I believe helps is to: * Practice - Practicing DevOps practically should be the primary way to become a DevOps engineer in my opinion * Read - blogs, books, ... anything that can enrich your knowledge about DevOps or related DevOps topics * Participate - there are great DevOps communities. I personally like [Reddit DevOps community](https://www.reddit.com/r/devops). Visiting there, I learn quite a lot on different topics. * Share - This is one of the reasons I created this project. Primary goal was to help others but a secondary goal quickly became to learn more. By asking questions, you actually learn better a certain topic. Try it out, take a certain subject and try to come up with questions you would ask someone to test his/her skills about that topic. ### Why most of the questions don't have answers? 1. Because we need more contributors 2. Because often asking questions is easier than answering them ### Where can I find answers to some of the questions in this repository? 1. Search for them using search engines, documentation pages, ... this is part of being a DevOps engineer 2. Use the communities: many people will be happy to help and answer your questions 3. Ask us. If you want, you can contact me or start a discussion on this project. ### Where the questions and answers are coming from? Well, everywhere! - past experience, colleagues, contributors, ... but please note we do not allow copying interview questions from interview questions sites to here. There are people who worked hard on adding those to their sites and we respect that.
As an evidence, we did deny pull requests with copied content from other sites. ### What are the top DevOps skills required for being a DevOps Engineer? It's a hard question and the reason is that if you'll ask 20 different people, you'll probably get at least 10 different answers but here is what I believe is common today: * OS - DevOps require you good understanding of operating system concepts. The level required is mainly depends on the company although in my opinion it should be the same level. You should understand how the operating system works, how to troubleshoot and debug issues, etc. * Programming is part of DevOps. The level again depends on the company. Some will require you to know basic level of scripting while others deep understanding of common algorithms, data structure, design patterns etc. * Cloud and Containers - while not 100% must in all companies/positions, this skill is on the rise every year and many (if not most) of the positions/companies require this skill. This specifically means: AWS/Azure/GCP, Docker/Podman, Kubernetes, ... * CI/CD - Be able to to answer questions like "Why do we need CI/CD?" and "What ways and models are there to perform CI/CD?". Eventually, practice assembling such processes and workflow, using whatever tools you feel comfortable with. ### I feel like there are some questions that shouldn't be included in this project Is that a question? :)
If you don't like some of the questions or think that some questions should be removed you can open an issue or submit a PR and we can discuss it there. We don't have rules against deleting questions (for now :P) ### Can I copy the questions from here to my site? You can (although I have no idea why would you want to), but: * Not without attribution. Many people worked hard on adding these questions and they deserve a proper credit for their work * Not if you plan to make money out of it. Directly or indirectly (e.g. ADS) as this is a free content and we would like it to stay this way :) Same goes for copying questions from different sources to this repository. We saw it happened already with a couple of pull requests and we rejected them. We will not merge pull requests with copied questions and answers from other sources. ### Can I add questions and/or answers to this project? I'll simply imagine you didn't ask that on an open source project... :) ### Why can't I add installation questions? In general, I prefer questions added to this repository will have certain educational value for the user. Either regarding a certain concept or even a very general question, but one that will make the user research on a certain topic and will make him eventually more familiar with some of its core concepts.
I know that this is not the case for every question in this repo as of today (e.g. questions about specific commands) but this is definitely something to aspire for. I see little to none value in what is known as "Installation Questions". Let's say I ask you "how to install Jenkins?". Should I conclude from your answer that you are familiar with what is Jenkins and/or how it works? In other words, is there a value in knowing how to install Jenkins? In my opinion, no. ### Where can I practice coding? Personally, I really like the following sites * [HackerRank](https://www.hackerrank.com) * [LeetCode](https://leetcode.com) * [Exercism](https://exercism.io) ### How to learn more DevOps? I listed some roadmaps in [devops-resources](https://github.com/bregman-arie/devops-resources) ### Why some questions repeat themselves? If you see two identical questions, that's a bug.
If you see two similar questions, that's a feature :D (= it's intentional) For example: 1. What is horizontal scaling? 2. The act of adding additional instances to the pool to handle scaling is called ________ scaling You are right, both ask about horizontal scaling but it's done from a different angle in every question and in addition, I do believe repetition helps you to learn something in a way where you are not fixed on the way it's asked, rather you understand the concept itself. ### Are you open for making big changes in the repository? Absolutely. Don't be afraid to raise ideas and start discussions.
I'll be more than happy to discuss any change you think we should make to improve the learning experience ================================================ FILE: prepare_for_interview-pt-BR.md ================================================ ## Como se preparar para entrevistas de DevOps/SRE/Engenheiro de Produção? Nota: o seguinte é opinativo. ### Habilidades que você deve ter #### Linux Todo Engenheiro de DevOps deve ter um profundo entendimento de pelo menos um sistema operacional e, se você tiver a opção de escolher, eu diria que definitivamente deveria ser o Linux, pois acredito que é um requisito de pelo menos 90% das vagas de DevOps por aí. Além disso, o Linux é uma parte quase integral de qualquer subárea ou domínio em DevOps, como Nuvem, Contêineres, etc. Normalmente, a pergunta seguinte é "Quão extenso deve ser meu conhecimento?" De todas as habilidades de DevOps, eu diria que esta, juntamente com a codificação, devem ser suas habilidades mais fortes. Esteja familiarizado com processos do SO, ferramentas de depuração, sistema de arquivos, rede, ... conheça seu sistema operacional, entenda como ele funciona, como solucionar problemas, etc. Não muito tempo atrás, criei uma lista de recursos do Linux bem [aqui](https://dev.to/abregman/collection-of-linux-resources-3nhk). Existem alguns bons sites lá que você pode usar para aprender mais sobre o Linux. #### Programação Minha crença pessoal é que todo engenheiro de DevOps deve saber programar, pelo menos até certo ponto. Tendo essa habilidade, você pode automatizar processos manuais, melhorar algumas das ferramentas de código aberto que está usando hoje ou construir novas ferramentas e projetos para fornecer uma solução para problemas existentes. Saber codificar = muito poder. Quando se trata de entrevistas, você notará que o nível de conhecimento depende muito da empresa ou da posição para a qual você está entrevistando. Alguns exigirão que você seja capaz de escrever scripts simples, enquanto outros mergulharão em algoritmos e estruturas de dados complexos. A melhor maneira de praticar essa habilidade é codificando de fato - scripts, desafios online, ferramentas CLI, aplicações web, ... apenas codifique :) Além disso, o seguinte provavelmente está claro para a maioria das pessoas, mas vamos esclarecer mesmo assim: quando tiver a chance de escolher qualquer linguagem para responder a tarefas/perguntas de codificação, escolha aquela com a qual você tem experiência! Alguns candidatos preferem escolher a linguagem que acham que a empresa está usando e isso é um grande erro, pois dar a resposta certa é sempre melhor do que uma resposta errada, não importa qual linguagem você tenha usado :) Eu recomendo os seguintes sites para praticar codificação: * [HackerRank](https://www.hackerrank.com) * [LeetCode](https://leetcode.com) * [Exercism](https://exercism.io) Começar seu próprio projeto também é uma boa ideia. Mais sobre isso mais tarde. #### Arquitetura e Design Este também é um aspecto importante do DevOps. Você deve ser capaz de descrever como projetar diferentes sistemas, fluxos de trabalho e arquiteturas. Além disso, a escala é um aspecto importante disso. Um design que pode funcionar para uma dúzia de hosts ou uma quantidade X de dados, não necessariamente funcionará bem com uma escala maior. Algumas ideias para você explorar: * Como projetar e implementar um pipeline de CI (ou pipelines) para verificar PRs, executar vários tipos diferentes de testes, empacotar o projeto e implantá-lo em algum lugar * Como projetar e implementar uma arquitetura ELK segura que receberá logs de 10.000 aplicativos e exibirá os dados eventualmente para o usuário * Designs de microsserviços também são bastante populares hoje em dia Em geral, você deve ser capaz de descrever alguns designs, projetos, arquiteturas, ... que você realizou. #### Ferramentas Algumas entrevistas se concentrarão em ferramentas ou tecnologias específicas. Quais ferramentas? isso é baseado principalmente em uma combinação do que você mencionou em seu C.V. e aquelas que são mencionadas na descrição da vaga e usadas na empresa. Aqui estão algumas perguntas que acredito que qualquer um deveria saber responder sobre as ferramentas com as quais ele/ela está familiarizado(a): * O que a ferramenta faz? O que ela nos permite alcançar que não poderíamos fazer sem ela? * Quais são suas vantagens sobre outras ferramentas na mesma área, com o mesmo propósito? Por que você a está usando especificamente? * Como ela funciona? * Como usá-la? * Melhores práticas que você aplica/usa ao usá-la Vamos nos aprofundar nos passos práticos de preparação ### Cenários || Desafios || Tarefas Esta é uma maneira muito comum de entrevistar hoje para cargos de DevOps. O candidato recebe uma tarefa que representa uma tarefa comum de Engenheiros de DevOps ou um conhecimento comum e o candidato tem várias horas ou dias para realizar a tarefa.
Esta é uma ótima maneira de se preparar para entrevistas e eu recomendo experimentá-la antes de realmente entrevistar. Como? Pegue os requisitos das descrições de vagas e converta-os em cenários. Vamos ver um exemplo: "Conhecimento em CI/CD" -> Cenário: crie um pipeline de CI/CD para um projeto. Neste ponto, algumas pessoas perguntam: "mas que projeto?" e a resposta é: que tal o GitHub? ele tem apenas 9125912851285192 projetos... e uma maneira gratuita de configurar CI para qualquer um deles (também uma ótima maneira de aprender a colaborar com os outros :) ) Vamos converter outro cenário: "Experiência com provisionamento de servidores" -> Cenário: provisione um servidor (para torná-lo mais interessante: crie um servidor web). E o último exemplo: "Experiência com scripting" -> Cenário: escreva um script. Não perca muito tempo pensando "que script devo escrever?". Simplesmente automatize algo que você está fazendo manualmente ou até mesmo implemente sua própria versão de pequenas utilidades comuns. ### Comece seu próprio projeto de DevOps Começar um projeto de DevOps é uma boa ideia porque: * Fará você praticar codificação * Será algo que você pode adicionar ao seu currículo e falar sobre com o entrevistador * Dependendo do tamanho e da complexidade, pode te ensinar algo sobre design em geral * Dependendo da adoção, pode te ensinar sobre o gerenciamento de projetos de Código Aberto O mesmo aqui, não pense demais sobre o que seu projeto deve ser. Apenas vá e construa algo :) ### Exemplos de perguntas de entrevista Faça uma lista de exemplos de perguntas de entrevista sobre vários tópicos/áreas como técnica, empresa, cargo, ... e tente respondê-las. Veja se você consegue respondê-las de forma fluente e detalhada. Melhor ainda, peça a um bom amigo/colega para desafiá-lo com algumas perguntas. Sua autoconsciência pode ser um obstáculo na autoavaliação objetiva do seu conhecimento :) ### Networking Para aqueles que frequentam meetups e conferências técnicas, pode ser uma ótima oportunidade para conversar com pessoas de outras empresas sobre seu processo de entrevista. Mas não comece com isso, pode ser bem estranho. Diga pelo menos olá primeiro... (: Fazer isso pode lhe dar muitas informações sobre o que esperar de uma entrevista em algumas empresas ou como se preparar melhor. ### Conheça seu currículo Pode parecer trivial, mas a ideia aqui é simples: esteja pronto para responder a qualquer pergunta sobre qualquer linha que você incluiu em seu currículo. Às vezes, os candidatos ficam surpresos quando são questionados sobre uma habilidade ou linha que parece não estar relacionada à posição, mas a verdade simples é: se você mencionou algo em seu currículo, é justo perguntar sobre isso. ### Conheça a empresa Esteja familiarizado com a empresa na qual você está entrevistando. Algumas ideias: * O que a empresa faz? * Quais produtos ela tem? * Por que seus produtos são únicos (ou melhores que outros produtos)? Esta também pode ser uma boa pergunta para você fazer ### Livros Pela minha experiência, isso não é feito por muitos candidatos, mas é uma das melhores maneiras de mergulhar em tópicos como sistema operacional, virtualização, escala, sistemas distribuídos, etc. Na maioria dos casos, você se sairá bem sem ler livros, mas para as entrevistas AAA (nível mais difícil) você vai querer ler alguns livros e, no geral, se você aspira a ser um Engenheiro de DevOps melhor, livros (também artigos, posts de blog) são uma ótima maneira de se desenvolver :) ### Considere começar em uma posição não-DevOps Embora não seja um passo de preparação, você deve saber que conseguir um emprego de DevOps como primeira posição pode ser desafiador. Não, não é impossível, mas ainda assim, como o DevOps abrange muitas práticas, ferramentas, ... diferentes, pode ser bastante desafiador e também avassalador para alguém tentar alcançá-lo como primeira posição.
Um caminho possível para se tornar um engenheiro de DevOps é começar com uma posição diferente (mas relacionada) e mudar de lá após 1-2 anos ou mais. Algumas ideias: * Administrador de Sistemas - Isso é perfeito porque todo Engenheiro de DevOps deve ter um sólido entendimento do SO e os sysadmins conhecem seu SO :) * Desenvolvedor/Engenheiro de Software - Um DevOps deve ter habilidades de codificação e esta posição fornecerá mais do que o conhecimento necessário na maioria dos casos * Engenheiro de QA - Este é mais complicado porque, na minha opinião, há menos áreas/habilidades sobrepostas com o Engenheiro de DevOps. Claro, os engenheiros de DevOps devem ter algum conhecimento sobre testes, mas geralmente, parece que suas habilidades/background sólidos são compostos principalmente por internos de sistema e habilidades de codificação. ### O que esperar de uma entrevista de DevOps? As entrevistas de DevOps podem ser muito diferentes. Algumas incluirão perguntas de design, algumas se concentrarão em codificação, outras incluirão perguntas técnicas curtas e você pode até ter uma entrevista onde o entrevistador apenas repassa seu currículo e discute sua experiência passada. Existem algumas coisas que você pode fazer sobre isso para que seja uma experiência menos avassaladora: 1. Você pode e provavelmente deve perguntar ao RH (em alguns casos, até mesmo ao líder da equipe) como é o processo de entrevista. Alguns serão gentis o suficiente para até mesmo lhe dizer como se preparar. 2. Geralmente, a descrição da vaga dá mais do que uma dica sobre onde o foco estará e no que você deve se concentrar em suas preparações, então leia-a com atenção. 3. Existem muitos sites que têm notas ou um resumo do processo de entrevista em diferentes empresas, especialmente grandes empresas. ### Não se esqueça de ser um entrevistador também Algumas pessoas tendem a ver as entrevistas como um caminho de mão única de "Determinar se um candidato é qualificado", mas na realidade, um candidato também deve determinar se a empresa na qual ele/ela está entrevistando é o lugar certo para ele/ela. * Eu me importo com o tamanho da equipe? Mais especificamente, eu me importo em ser um show de um homem só ou fazer parte de uma equipe maior? * Eu me importo com o equilíbrio entre vida profissional e pessoal? * Eu me importo com o crescimento pessoal и como isso é feito na prática? * Eu me importo em saber quais são minhas responsabilidades como parte da função? Se você se importa, você também deve desempenhar o papel de entrevistador :) ### Uma Última Coisa [Boa sorte](https://youtu.be/AFUrG1-BAt4?t=59) :) ================================================ FILE: prepare_for_interview.md ================================================ ## How to prepare for DevOps/SRE/Production Engineer interviews? Note: the following is opinionated. ### Skills you should have #### Linux Every DevOps Engineer should have a deep understanding of at least one operating system and if you have the option to choose then I would say it should definitely be Linux as I believe it's a requirement of at least 90% of the DevOps jobs postings out there. In addition, Linux is almost integral part of any sub-area or domain in DevOps like Cloud, Containers, etc. Usually, the followup question is "How extensive should my knowledge be?" Out of all the DevOps skills, I would say this, along with coding, should be your strongest skills. Be familiar with OS processes, debugging tools, filesystem, networking, ... know your operating system, understand how it works, how to troubleshoot issues, etc. Not long ago, I've created a list of Linux resources right [here](https://dev.to/abregman/collection-of-linux-resources-3nhk). There are some good sites there that you can use for learning more about Linux. #### Programming My personal belief is that any DevOps engineer should know programming, at least to some degree. Having this skill you can automate manual processes, improve some of the open source tools you are using today or build new tools & projects to provide a solution to existing problems. Knowing how to code = a lot of power. When it comes to interviews you'll notice that the level of knowledge very much depends on the company or position you are interviewing for. Some will require you just to be able to write simple scripts while others will deep dive into complex algorithms and data structures. The best way to practice this skill is by doing some actual coding - scripts, online challenges, CLI tools, web applications, ... just code :) Also, the following is probably clear to most people but let's still clarify it: when given the chance to choose any language for answering coding tasks/questions, choose the one you have experience with! Some candidates prefer to choose the language they think the company is using and this is a huge mistake since giving the right answer is always better than a wrong answer, no matter which language you have used :) I recommend the following sites for practicing coding: * [HackerRank](https://www.hackerrank.com) * [LeetCode](https://leetcode.com) * [Exercism](https://exercism.io) Starting your own project is also a good idea. More on that later on. #### Architecture and Design This is also an important aspect of DevOps. You should be able to describe how to design different systems, workflows, and architectures. Also, the scale is an important aspect of that. A design which might work for a dozen of hosts or X amount of data, will not necessarily work well with bigger scale. Some ideas for you to explore: * How to design and implement a CI pipeline (or pipelines) for verifying PRs, run multiple different types of tests, package the project and deploy it somewhere * How to design and implement secured ELK architecture which will get logs from 10,000 apps and will display the data eventually to the user * Microservices designs are also quite popular these days In general, you should be able to describe some designs, projects, architectures, ... you performed. #### Tooling Some interviews will focus on specific tools or technologies. Which tools? this is mainly based on a combination of what you mentioned in your C.V & those that are mentioned in the job posting and used in the company. Here are some questions I believe anyone should know to answer regarding the tools he/she is familiar with: * What the tool does? What it allows us to achieve that we couldn't do without it? * What its advantages over other tools in the same area, with the same purpose? Why you specifically using it? * How it works? * How to use it? * Best practices you apply/use when using it Let's deep dive into practical preparation steps ### Scenarios || Challenges || Tasks This is a very common way to interview today for DevOps roles. The candidate is given a task which represents a common task of DevOps Engineers or a piece of common knowledge and the candidate has several hours or days to accomplish the task.
This is a great way to prepare for interviews and I recommend to try it out before actually interviewing. How? Take requirements from job posts and convert them into scenarios. Let's see an example: "Knowledge in CI/CD" -> Scenario: create a CI/CD pipeline for a project. At this point, some people ask: "but what project?" and the answer is: what about GitHub? it has only 9125912851285192 projects...and a free way to set up CI to any of them (also a great way to learn how to collaborate with others :) ) Let's convert another scenario: "Experience with provisioning servers" -> Scenario: provision a server (to make it more interesting: create a web server). And the last example: "Experience with scripting" -> Scenario: write a script. Don't waste too much time thinking "what script should I write?". Simply automate something you are doing manually or even implement your own version of common small utils. ### Start your own DevOps project Starting a DevOps project is a good idea because: * It will make you practice coding * It will be something you can add to your resume and talk about with the interviewer * Depends on size and complexity, it can teach you something about design in general * Depends on adoption, it can teach you about managing Open Source projects Same here, don't overthink what your project should be about. Just go and build something :) ### Sample interview questions Make a sample list of interview questions on various topics/areas like technical, company, role, ... and try to answer them. See if you can manage to answer them in a fluent, detailed way. Better yet, ask a good friend/colleague to challenge you with some questions. Your self-awareness might be an obstacle in objective self-review of your knowledge :) ### Networking For those who attend technical meetups and conferences, it can be a great opportunity to chat with people from other companies on their interviewing process. But don't start with it, it can be quite awkward. Say at least hello first... (: Doing so can give you a lot of information on what to expect from an interview at some companies or how to better prepare. ### Know your resume It may sound trivial but the idea here is simple: be ready to answer any question regarding any line you included in your resume. Sometimes candidates surprised when they are asked on a skill or line which seems to be not related to the position but the simple truth is: if you mentioned something on your resume, it's only fair to ask you about it. ### Know the company Be familiar with the company you are interviewing at. Some ideas: * What the company does? * What products it has? * Why its products are unique (or better than other products)? This can also be a good question for you to ask ### Books From my experience, this is not done by many candidates but it's one of the best ways to deep dive into topics like operating system, virtualization, scale, distributed systems, etc. In most cases, you will do fine without reading books but for the AAA interviews (hardest level) you'll want to read some books and overall if you inspire to be better DevOps Engineer, books (also articles, blog posts) is a great way develop yourself :) ### Consider starting in non-DevOps position While not a preparation step, you should know that landing DevOps as a first position can be challenging. No, it's not impossible but still, since DevOps covers many different practices, tools, ... it can be quite challenging and also overwhelming for someone to try and achieve it as a first position.
A possible path to becoming a DevOps engineer is to start with actually a different (but related) position and switch from there after 1-2 years or more. Some ideas: * System Administrator - This is perfect because every DevOps Engineer should have a solid understanding of the OS and sysadmins know their OS :) * Software Developer/Engineer - A DevOps should have coding skills and this position will provide more than the required knowledge in most cases * QA Engineer - This is a more tricky one because IMHO there are less overlapping areas/skills with DevOps Engineer. Sure, DevOps engineers should have some knowledge about testing but usually, it seems their solid skills/background is mainly composed out of system internals and coding skills. ### What to expect from a DevOps interview? DevOps interviews can be very different. Some will include design questions, some will focus on coding, others will include short technical questions and you might even have an interview where the interviewer only goes over your resume and discussing your past experience. There are a couple of things you can do about it so it will be a less overwhelming experience: 1. You can and probably should ask the HR (in some cases even the team lead) how the interview process looks like. Some will be kind enough to even tell you how to prepare. 2. Usually, the job posting gives more than a hint on where the focus will be and what you should focus on in your preparations so read it carefully. 3. There are plenty of sites that have notes or a summary of the interview process in different companies, especially big enterprises. ### Don't forget to be an interviewer as well Some people tend to look at interviews as a one-way road of "Determining whether a candidate is qualified" but in reality, a candidate should also determine whether the company he/she is interviewing at, is the right place for him/her. * Do I care about team size? More specifically, do I care about being a one-man show or being part of a bigger team? * Do I care about work-life balance? * Do I care about personal growth and how it's practically done? * Do I care about knowing what are my responsibilities as part of the role? If you do, you should also play the interviewer role :) ### One Last Thing [Good luck](https://youtu.be/AFUrG1-BAt4?t=59) :) ================================================ FILE: scripts/aws s3 event triggering/README.md ================================================ [](./sample.png) ================================================ FILE: scripts/aws s3 event triggering/aws_s3_event_trigger.sh ================================================ #!/bin/bash # always put up the detail of scripts . version, author, what it does, what event triggers and all .. ### # Author: Adarsh Rawat # Version: 1.0.0 # Objective: Automate Notification for a object uploaded or created in s3 bucket. ### # debug what is happening set -x # all these cmds are aws cli commands | abhishek veermalla day 4-5 devops # store aws account id in a variable aws_account_id=$(aws sts get-caller-identity --query 'Account' --output text) # print the account id from the variable echo "aws account id: $aws_account_id" # set aws region, bucket name and other variables aws_region="us-east-1" aws_bucket="s3-lambda-event-trigger-bucket" aws_lambda="s3-lambda-function-1" aws_role="s3-lambda-sns" email_address="adarshrawat8304@gmail.com" # create iam role for the project role_response=$(aws iam create-role --role-name s3-lambda-sns --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [{ "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "lambda.amazonaws.com", "s3.amazonaws.com", "sns.amazonaws.com" ] } }] }') # jq is json parser here parse the role we created # extract the role arn from json resposne and store in variable role_arn=$(echo "$role_response" | jq -r '.Role.Arn') # print the role arn echo "Role ARN: $role_arn" # attach permissions to the role aws iam attach-role-policy --role-name $aws_role --policy-arn arn:aws:iam::aws:policy/AWSLambda_FullAccess aws iam attach-role-policy --role-name $aws_role --policy-arn arn:aws:iam::aws:policy/AmazonSNSFullAccess # create s3 bucket and get the output in a variable bucket_output=$(aws s3api create-bucket --bucket "$aws_bucket" --region "$aws_region") # print the output from the variable echo "bucket output: $bucket_output" # upload a file to the bucket aws s3 cp ./sample.png s3://"$aws_bucket"/sample.png # create a zip file to upload lambda function zip -r s3-lambda.zip ./s3-lambda sleep 5 # create a lambda function aws lambda create-function \ --region $aws_region \ --function $aws_lambda \ --runtime "python3.8" \ --handler "s3-lambda/s3-lambda.lambda_handler" \ --memory-size 128 \ --timeout 30 \ --role "arn:aws:iam::$aws_account_id:role/$aws_role" \ --zip-file "fileb://./s3-lambda.zip" # add permissions to s3 bucket to invoke lambda LambdaFunctionArn="arn:aws:lambda:us-east-1:$aws_account_id:function:s3-lambda" aws s3api put-bucket-notification-configuration \ --region "$aws_region" \ --bucket "$aws_bucket" \ --notification-configuration '{ "LambdaFunctionConfigurations": [{ "LambdaFunctionArn": "'"$LambdaFunctionArn"'", "Events": ["s3:ObjectCreated:*"] }] }' aws s3api put-bucket-notification-configuration \ --region "$aws_region" \ --bucket "$aws_bucket" \ --notification-configuration '{ "LambdaFunctionConfigurations": [{ "LambdaFunctionArn": "'"$LambdaFunctionArn"'", "Events": ["s3:ObjectCreated:*"] }] }' # create an sns topic and save the topic arn to a variable topic_arn=$(aws sns create-topic --name s3-lambda-sns --output json | jq -r '.TopicArn') # print the topic arn echo "SNS Topic ARN: $topic_arn" # Trigger SNS topic using lambda function # Add sns topic using lambda function aws sns subscribe \ --topic-arn "$topic_arn" \ --protocol email \ --notification-endpoint "$email_address" # publish sns aws sns publish \ --topic-arn "$topic_arn" \ --subject "A new object created in s3 bucket" \ --message "Hey, a new data object just got delievered into the s3 bucket $aws_bucket" ================================================ FILE: scripts/aws s3 event triggering/s3-lambda/requirements.txt ================================================ boto3==1.17.95 ================================================ FILE: scripts/aws s3 event triggering/s3-lambda/s3-lambda.py ================================================ import boto3 import json def lambda_handler(event, context): # i want to know that event thing print(event) # extract relevant information from the s3 event trigger bucket_name = event['Records'][0]['s3']['bucket']['name'] object_key = event['Records'][0]['s3']['object']['key'] # perform desired operations with the uploaded file print(f"File '{object_key}' was uploaded to bucket '{bucket_name}'") # example: send a notification via SNS sns_client = boto3.client('sns') topic_arn = 'arn:aws:sns:us-east-1::s3-lambda-sns' sns_client.publish( TopicArn=topic_arn, Subject='s3 object created !!', Message=f"File '{object_key}' was uploaded to bucket '{bucket_name}'" ) # Example: Trigger another Lambda function # lambda_client = boto3.client('lambda') # target_function_name = 'my-another-lambda-function' # lambda_client.invoke( # FunctionName=target_function_name, # InvocationType='Event', # Payload=json.dumps({'bucket_name': bucket_name, 'object_key': object_key}) # ) # in case of queuing and other objectives similar to the Netflix flow of triggering return { 'statusCode': 200, 'body': json.dumps("Lambda function executed successfully !!") } ================================================ FILE: scripts/count_questions.sh ================================================ #!/usr/bin/env bash set -eu count=$(echo $(( $(grep -E "\[Exercise\]|" -c README.md topics/*/README.md | awk -F: '{ s+=$2 } END { print s }' )))) echo "There are $count questions and exercises" sed -i "s/currently \*\*[0-9]*\*\*/currently \*\*$count\\**/" README.md ================================================ FILE: scripts/question_utils.py ================================================ """ Question utils functions """ import pathlib from random import choice from typing import List import re README_PATH = pathlib.Path(__file__).parent.parent / "README.md" EXERCISES_PATH = pathlib.Path(__file__).parent.parent / "exercises" DETAILS_PATTERN = re.compile(r"
(.*?)
", re.DOTALL) SUMMARY_PATTERN = re.compile(r"(.*?)", re.DOTALL) B_PATTERN = re.compile(r"(.*?)", re.DOTALL) def get_file_content() -> str: with README_PATH.open("r", encoding="utf-8") as f: return f.read() def get_question_list(file_content: str) -> List[str]: details = DETAILS_PATTERN.findall(file_content) return [ SUMMARY_PATTERN.search(detail).group(1) for detail in details if SUMMARY_PATTERN.search(detail) ] def get_answered_questions(file_content: str) -> List[str]: details = DETAILS_PATTERN.findall(file_content) answered = [] for detail in details: summary_match = SUMMARY_PATTERN.search(detail) b_match = B_PATTERN.search(detail) if ( summary_match and b_match and summary_match.group(1).strip() and b_match.group(1).strip() ): answered.append(summary_match.group(1)) return answered def get_answers_count() -> List[int]: file_content = get_file_content() answered = get_answered_questions(file_content) all_questions = get_question_list(file_content) return [len(answered), len(all_questions)] def get_challenges_count() -> int: return len(list(EXERCISES_PATH.glob("*.md"))) def get_random_question(question_list: List[str], with_answer: bool = False) -> str: if with_answer: return choice(get_answered_questions(get_file_content())) return choice(get_question_list(get_file_content())) """Use this question_list. Unless you have already opened/worked/need the file, then don't or you will end up doing the same thing twice. eg: # my_dir/main.py from scripts import question_utils print( question_utils.get_answered_questions( question_utils.get_question_list( question_utils.get_file_content() ) ) ) >> 123 """ if __name__ == "__main__": import doctest doctest.testmod() ================================================ FILE: scripts/random_question.py ================================================ import random import optparse import os def main(): """Reads through README.md for question/answer pairs and adds them to a list to randomly select from and quiz yourself. Supports skipping questions with no documented answer with the -s flag """ parser = optparse.OptionParser() parser.add_option("-s", "--skip", action="store_true", help="skips questions without an answer.", default=False) options, args = parser.parse_args() with open('README.md', 'r') as f: text = f.read() questions = [] while True: question_start = text.find('') + 9 question_end = text.find('') answer_end = text.find('') if answer_end == -1: break question = text[question_start: question_end].replace('
', '').replace('', '') answer = text[question_end + 17: answer_end] questions.append((question, answer)) text = text[answer_end + 1:] num_questions = len(questions) while True: try: question, answer = questions[random.randint(0, num_questions)] if options.skip and not answer.strip(): continue os.system("clear") print(question) print("...Press Enter to show answer...") input() print('A: ', answer) print("... Press Enter to continue, Ctrl-C to exit") input() except KeyboardInterrupt: break print("\nGoodbye! See you next time.") if __name__ == '__main__': main() ================================================ FILE: scripts/run_ci.sh ================================================ #!/usr/bin/env bash set -euo pipefail PROJECT_DIR="$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/.." # Use the `-print0` option to handle spaces safely, and while-read loop: find "${PROJECT_DIR}" \ -name "*.md" \ -not -path "${PROJECT_DIR}/tests/*" \ -print0 | while IFS= read -r -d '' file do python "${PROJECT_DIR}/tests/syntax_lint.py" "${file}" > /dev/null done echo "- Syntax lint tests on MD files passed successfully" flake8 --max-line-length=100 . && echo "- PEP8 Passed" ================================================ FILE: scripts/update_question_number.py ================================================ """ Meant to be used like this: python scripts/update_question_number.py """ import pathlib from scripts.question_utils import get_question_list, get_challenges_count LINE_FLAG = b":bar_chart:" p = pathlib.Path(__file__).parent.parent.joinpath('README.md') with open(p, 'rb') as f: file = f.readlines() file_list = [line.rstrip() for line in file] question_list = get_question_list(file_list) question_count = len(question_list) total_count = question_count + get_challenges_count() print(question_count) print(get_challenges_count()) print(total_count) for line in file: if LINE_FLAG in line: file[file.index(line)] = b':bar_chart:  There are currently **%s** questions\r\n' %\ str(total_count).encode() break with open(p, 'wb') as f: f.writelines(file) ================================================ FILE: tests/scripts_question_utils_unittest.py ================================================ import unittest from pathlib import Path from typing import List from scripts.question_utils import get_answered_questions, get_question_list def open_test_case_file(n: int) -> List[bytes]: tests_path = Path(__file__).parent.joinpath() with open(f'{tests_path}/testcases/testcase{n}.md', 'rb') as f: file_list = [line.rstrip() for line in f.readlines()] return file_list class QuestionCount(unittest.TestCase): def test_case_1(self): raw_list = open_test_case_file(1) question_list = get_question_list(raw_list) answers = get_answered_questions(question_list) self.assertEqual(len(question_list), 11) self.assertEqual(len(answers), 3) def test_case_2(self): raw_list = open_test_case_file(2) question_list = get_question_list(raw_list) answers = get_answered_questions(question_list) self.assertEqual(len(question_list), 16) self.assertEqual(len(answers), 11) ================================================ FILE: tests/syntax_checker_unittest.py ================================================ """ WIP Yes, we do write tests for our tests. """ from pathlib import Path from typing import List from unittest import TestCase from tests import syntax_lint def open_test_case_file(n: int) -> List[bytes]: tests_path = Path(__file__).parent.joinpath() with open(f'{tests_path}/testcases/testcase{n}.md', 'rb') as f: file_list = [line.rstrip() for line in f.readlines()] return file_list test_case_1 = open_test_case_file(1) test_case_2 = open_test_case_file(2) test_case_3 = open_test_case_file(3) class TestSyntax(TestCase): def test_details_count_case1(self): self.assertTrue(syntax_lint.count_details(test_case_1)) def test_details_count_case2(self): self.assertTrue(syntax_lint.count_details(test_case_2)) def test_details_errors_1(self): syntax_lint.check_details_tag(test_case_1) self.assertFalse(syntax_lint.errors) def test_details_errors_2(self): syntax_lint.check_details_tag(test_case_2) self.assertFalse(syntax_lint.errors) # # def test_details_error_exist_1(self): # syntax_checker.check_details_tag(test_case_3) # print(syntax_checker.errors) # self.assertEqual(len(syntax_checker.errors), 3) ================================================ FILE: tests/syntax_lint.py ================================================ """ Testing suite for https://github.com/bregman-arie/devops-interview-questions written by surister Even though both check_details_tag and check_summary_tags are practically the same, due to readability and functionality it was decided to be split like that. Usage: $ python tests/syntax_lint.py """ import sys p = sys.argv[1] errors = [] def count_details(file_list): """ Counts the total amount of
and
Used for debugging purpose, not meant to be used in actual tests """ details_final_count = 0 details_count = 0 for line_number, line in enumerate(file_list): if b"
" in line: details_count += 1 if b"
" in line: details_final_count += 1 return details_count == details_final_count def count_summary(file_list): """ Counts the total amount of
and
Used for debugging purpose, not meant to be used in actual tests """ details_final_count = 0 details_count = 0 for line_number, line in enumerate(file_list): if b"" in line: details_count += 1 if b"" in line: details_final_count += 1 return details_count == details_final_count def check_details_tag(file_list): """ Check whether the structure:
...
Is correctly followed, if not generates an error. """ after_detail = False error = False err_message = "" for line_number, line in enumerate(file_list): if b"
" in line and b"
" in line: pass else: if b"
" in line and after_detail: err_message = f"Missing closing detail tag round line {line_number - 1}" error = True if b"
" in line and not after_detail: err_message = f"Missing opening detail tag round line {line_number - 1}" error = True if b"
" in line: after_detail = True if b"
" in line and after_detail: after_detail = False if error: errors.append(err_message) error = False def check_summary_tag(file_list): """ Check whether the structure: ... Is correctly followed, if not generates an error. """ after_summary = False error = False err_message = "" for idx, line in enumerate(file_list): line_number = idx + 1 if b"" in line and b"" in line: if after_summary: err_message = f"Missing closing summary tag around line {line_number}" error = True else: if b"" in line and after_summary: err_message = f"Missing closing summary tag around line {line_number}" error = True if b"" in line and not after_summary: err_message = f"Missing opening summary tag around line {line_number}" error = True if b"" in line: after_summary = True if b"" in line and after_summary: after_summary = False if error: errors.append(err_message) error = False def check_md_file(file_name): with open(p, "rb") as f: file_list = [line.rstrip() for line in f.readlines()] check_details_tag(file_list) check_summary_tag(file_list) if __name__ == "__main__": print(f"..........Checking {p}..........") check_md_file(p) if errors: print(f"{p} failed", file=sys.stderr) for error in errors: print(error, file=sys.stderr) exit(1) print("Tests passed successfully.") ================================================ FILE: tests/testcases/testcase1.md ================================================
What is Docker? What are you using it for?
How containers are different from VMs?
The primary difference between containers and VMs is that containers allow you to virtualize multiple workloads on the operating system while in the case of VMs the hardware is being virtualized to run multiple machines each with its own OS.
In which scenarios would you use containers and in which you would prefer to use VMs?
You should choose VMs when: * you need run an application which requires all the resources and functionalities of an OS * you need full isolation and security You should choose containers when: * you need a lightweight solution * Running multiple versions or instances of a single application
Explain Docker architecture
Describe in detail what happens when you run `docker run hello-world`?
Docker CLI passes your request to Docker daemon. Docker daemon downloads the image from Docker Hub Docker daemon creates a new container by using the image it downloaded Docker daemon redirects output from container to Docker CLI which redirects it to the standard output
How do you run a container?
What `docker commit` does?. When will you use it?
How would you transfer data from one container into another?
What happens to data of the container when a container exists?
Explain what each of the following commands do: * docker run * docker rm * docker ps * docker pull * docker build * docker commit
How do you remove old, non running, containers?
================================================ FILE: tests/testcases/testcase2.md ================================================
Explain the following code: :(){ :|:& };:
Can you give an example to some Bash best practices?
What is the ternary operator? How do you use it in bash?
A short way of using if/else. An example: [[ $a = 1 ]] && b="yes, equal" || b="nope"
What does the following code do and when would you use it? diff <(ls /tmp) <(ls /var/tmp)
It is called 'process substitution'. It provides a way to pass the output of a command to another command when using a pipe | is not possible. It can be used when a command does not support STDIN or you need the output of multiple commands. https://superuser.com/a/1060002/167769
## SQL #### :baby: Beginner
What does SQL stand for?
Structured Query Language
How is SQL Different from NoSQL
The main difference is that SQL databases are structured (data is stored in the form of tables with rows and columns - like an excel spreadsheet table) while NoSQL is unstructured, and the data storage can vary depending on how the NoSQL DB is set up, such as key-value pair, document-oriented, etc.
What does it mean when a database is ACID compliant?
ACID stands for Atomicity, Consistency, Isolation, Durability. In order to be ACID compliant, the database much meet each of the four criteria **Atomicity** - When a change occurs to the database, it should either succeed or fail as a whole. For example, if you were to update a table, the update should completely execute. If it only partially executes, the update is considered failed as a whole, and will not go through - the DB will revert back to it's original state before the update occurred. It should also be mentioned that Atomicity ensures that each transaction is completed as it's own stand alone "unit" - if any part fails, the whole statement fails. **Consistency** - any change made to the database should bring it from one valid state into the next. For example, if you make a change to the DB, it shouldn't corrupt it. Consistency is upheld by checks and constraints that are pre-defined in the DB. For example, if you tried to change a value from a string to an int when the column should be of datatype string, a consistent DB would not allow this transaction to go through, and the action would not be executed **Isolation** - this ensures that a database will never be seen "mid-update" - as multiple transactions are running at the same time, it should still leave the DB in the same state as if the transactions were being run sequentially. For example, let's say that 20 other people were making changes to the database at the same time. At the time you executed your query, 15 of the 20 changes had gone through, but 5 were still in progress. You should only see the 15 changes that had completed - you wouldn't see the database mid-update as the change goes through. **Durability** - Once a change is committed, it will remain committed regardless of what happens (power failure, system crash, etc.). This means that all completed transactions must be recorded in non-volatile memory. Note that SQL is by nature ACID compliant. Certain NoSQL DB's can be ACID compliant depending on how they operate, but as a general rule of thumb, NoSQL DB's are not considered ACID compliant
When is it best to use SQL? NoSQL?
SQL - Best used when data integrity is crucial. SQL is typically implemented with many businesses and areas within the finance field due to it's ACID compliance. NoSQL - Great if you need to scale things quickly. NoSQL was designed with web applications in mind, so it works great if you need to quickly spread the same information around to multiple servers Additionally, since NoSQL does not adhere to the strict table with columns and rows structure that Relational Databases require, you can store different data types together.
What is a Cartesian Product?
A Cartesian product is when all rows from the first table are joined to all rows in the second table. This can be done implicitly by not defining a key to join, or explicitly by calling a CROSS JOIN on two tables, such as below: Select * from customers **CROSS JOIN** orders; Note that a Cartesian product can also be a bad thing - when performing a join on two tables in which both do not have unique keys, this could cause the returned information to be incorrect.
##### SQL Specific Questions For these questions, we will be using the Customers and Orders tables shown below: **Customers** Customer_ID | Customer_Name | Items_in_cart | Cash_spent_to_Date ------------ | ------------- | ------------- | ------------- 100204 | John Smith | 0 | 20.00 100205 | Jane Smith | 3 | 40.00 100206 | Bobby Frank | 1 | 100.20 **ORDERS** Customer_ID | Order_ID | Item | Price | Date_sold ------------ | ------------- | ------------- | ------------- | ------------- 100206 | A123 | Rubber Ducky | 2.20 | 2019-09-18 100206 | A123 | Bubble Bath | 8.00 | 2019-09-18 100206 | Q987 | 80-Pack TP | 90.00 | 2019-09-20 100205 | Z001 | Cat Food - Tuna Fish | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Chicken | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Beef | 10.00 | 2019-08-05 100205 | Z001 | Cat Food - Kitty quesadilla | 10.00 | 2019-08-05 100204 | X202 | Coffee | 20.00 | 2019-04-29
How would I select all fields from this table?
Select *
From Customers;
How many items are in John's cart?
Select Items_in_cart
From Customers
Where Customer_Name = "John Smith";
What is the sum of all the cash spent across all customers?
Select SUM(Cash_spent_to_Date) as SUM_CASH
From Customers;
Tell me about your last big project/task you worked on
What was most challenging part in the project you worked on?
Why do you want to work here?
How did you hear about us?
Tell them how did you hear about them :D Relax, there is no wrong or right answer here...I think.
================================================ FILE: tests/testcases/testcase3.md ================================================ You have a colleague you don‘t get along with. Tell us some strategies how you create a good work relationship with them anyway.
Bad answer: I don't. Better answer: Every person has strengths and weaknesses. This is true also for colleagues I don't have good work relationship with and this is what helps me to create good work relationship with them. If I am able to highlight or recognize their strengths I'm able to focus mainly on that when communicating with them.
What do you love about your work?
You know the best, but some ideas if you find it hard to express yourself: * Diversity * Complexity * Challenging * Communication with several different teams
What are your responsibilities in your current position?
You know the best :)
Why should we hire you for the role?
You can use and elaborate on one or all of the following: * Passion * Motivation * Autodidact * Creativity (be able to support it with some actual examples) ## Questions you CAN ask A list of questions you as a candidate can ask the interviewer during or after the interview. These are only a suggestion, use them carefully. Not every interviewer will be able to answer these (or happy to) which should be perhaps a red flag warning for your regarding working in such place but that's really up to you.
What do you like about working here?
How does the company promote personal growth?
What is the current level of technical debt you are dealing with?
Be careful when asking this question - all companies, regardless of size, have some level of tech debt. Phrase the question in the light that all companies have the deal with this, but you want to see the current pain points they are dealing with
This is a great way to figure how managers deal with unplanned work, and how good they are at setting expectations with projects.
================================================ FILE: topics/ansible/README.md ================================================ ## Ansible ### Ansible Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | My First Task | Tasks | [Exercise](my_first_task.md) | [Solution](solutions/my_first_task.md) | Upgrade and Update Task | Tasks | [Exercise](update_upgrade_task.md) | [Solution](solutions/update_upgrade_task.md) | My First Playbook | Playbooks | [Exercise](my_first_playbook.md) | [Solution](solutions/my_first_playbook.md) ### Ansible Self Assessment
Describe each of the following components in Ansible, including the relationship between them: * Task * Inventory * Module * Play * Playbook * Role
Task – a call to a specific Ansible module Module – the actual unit of code executed by Ansible on your own host or a remote host. Modules are indexed by category (database, file, network, …) and also referred to as task plugins. Inventory – An inventory file defines hosts and/or groups of hosts on which Ansible tasks executed upon. The inventory file can be in one of many formats, depending on the inventory plugins you have. The most common formats are INI and YAML. Play – One or more tasks executed on a given host(s) Playbook – One or more plays. Each play can be executed on the same or different hosts Role – Ansible roles allows you to group resources based on certain functionality/service such that they can be easily reused. In a role, you have directories for variables, defaults, files, templates, handlers, tasks, and metadata. You can then use the role by simply specifying it in your playbook.
How Ansible is different from other automation tools? (e.g. Chef, Puppet, etc.)
Ansible is: * Agentless * Minimal run requirements (Python & SSH) and simple to use * Default mode is "push" (it supports also pull) * Focus on simpleness and ease-of-use
True or False? Ansible follows the mutable infrastructure paradigm
True. In immutable infrastructure approach, you'll replace infrastructure instead of modifying it.
Ansible rather follows the mutable infrastructure paradigm where it allows you to change the configuration of different components, but this approach is not perfect and has its own disadvantages like "configuration drift" where different components may reach different state for different reasons.
True or False? Ansible uses declarative style to describe the expected end state
False. It uses a procedural style.
What kind of automation you wouldn't do with Ansible and why?
While it's possible to provision resources with Ansible, some prefer to use tools that follow immutable infrastructure paradigm. Ansible doesn't saves state by default. So a task that creates 5 instances for example, when executed again will create additional 5 instances (unless additional check is implemented or explicit names are provided) while other tools might check if 5 instances exist. If only 4 exist (by checking the state file for example), one additional instance will be created to reach the end goal of 5 instances.
How do you list all modules and how can you see details on a specific module?

1. Ansible online docs 2. `ansible-doc -l` for list of modules and `ansible-doc [module_name]` for detailed information on a specific module
#### Ansible - Inventory
What is an inventory file and how do you define one?
An inventory file defines hosts and/or groups of hosts on which Ansible tasks executed upon. An example of inventory file: ``` 192.168.1.2 192.168.1.3 192.168.1.4 [web_servers] 190.40.2.20 190.40.2.21 190.40.2.22 ```
What is a dynamic inventory file? When you would use one?

A dynamic inventory file tracks hosts from one or more sources like cloud providers and CMDB systems. You should use one when using external sources and especially when the hosts in your environment are being automatically
spun up and shut down, without you tracking every change in these sources.
#### Ansible - Variables
Modify the following task to use a variable instead of the value "zlib" and have "zlib" as the default in case the variable is not defined ``` - name: Install a package package: name: "zlib" state: present ```
``` - name: Install a package package: name: "{{ package_name|default('zlib') }}" state: present ```
How to make the variable "use_var" optional? ``` - name: Install a package package: name: "zlib" state: present use: "{{ use_var }}" ```
With "default(omit)" ``` - name: Install a package package: name: "zlib" state: present use: "{{ use_var|default(omit) }}" ```
What would be the result of the following play?
``` --- - name: Print information about my host hosts: localhost gather_facts: 'no' tasks: - name: Print hostname debug: msg: "It's me, {{ ansible_hostname }}" ``` When given a written code, always inspect it thoroughly. If your answer is “this will fail” then you are right. We are using a fact (ansible_hostname), which is a gathered piece of information from the host we are running on. But in this case, we disabled facts gathering (gather_facts: no) so the variable would be undefined which will result in failure.
When the value '2017'' will be used in this case: `{{ lookup('env', 'BEST_YEAR') | default('2017', true) }}`?
when the environment variable 'BEST_YEAR' is empty or false.
If the value of certain variable is 1, you would like to use the value "one", otherwise, use "two". How would you do it?
`{{ (certain_variable == 1) | ternary("one", "two") }}`
The value of a certain variable you use is the string "True". You would like the value to be a boolean. How would you cast it?
`{{ some_string_var | bool }}`
You want to run Ansible playbook only on specific minor version of your OS, how would you achieve that?
What the "become" directive used for in Ansible?
What are facts? How to see all the facts of a certain host?
What would be the result of running the following task? How to fix it? ``` - hosts: localhost tasks: - name: Install zlib package: name: zlib state: present ```
Which Ansible best practices are you familiar with?. Name at least three
Explain the directory layout of an Ansible role
What 'blocks' are used for in Ansible?
How do you handle errors in Ansible?
You would like to run a certain command if a task fails. How would you achieve that?
Write a playbook to install ‘zlib’ and ‘vim’ on all hosts if the file ‘/tmp/mario’ exists on the system.
``` --- - hosts: all vars: mario_file: /tmp/mario package_list: - 'zlib' - 'vim' tasks: - name: Check for mario file stat: path: "{{ mario_file }}" register: mario_f - name: Install zlib and vim if mario file exists become: "yes" package: name: "{{ item }}" state: present with_items: "{{ package_list }}" when: mario_f.stat.exists ```
Write a single task that verifies all the files in files_list variable exist on the host
``` - name: Ensure all files exist assert: that: - item.stat.exists loop: "{{ files_list }}" ```
Write a playbook to deploy the file ‘/tmp/system_info’ on all hosts except for controllers group, with the following content
``` I'm and my operating system is ``` Replace and with the actual data for the specific host you are running on The playbook to deploy the system_info file ``` --- - name: Deploy /tmp/system_info file hosts: all:!controllers tasks: - name: Deploy /tmp/system_info template: src: system_info.j2 dest: /tmp/system_info ``` The content of the system_info.j2 template ``` # {{ ansible_managed }} I'm {{ ansible_hostname }} and my operating system is {{ ansible_distribution } ```
The variable 'whoami' defined in the following places: * role defaults -> whoami: mario * extra vars (variables you pass to Ansible CLI with -e) -> whoami: toad * host facts -> whoami: luigi * inventory variables (doesn’t matter which type) -> whoami: browser According to variable precedence, which one will be used?
The right answer is ‘toad’. Variable precedence is about how variables override each other when they set in different locations. If you didn’t experience it so far I’m sure at some point you will, which makes it a useful topic to be aware of. In the context of our question, the order will be extra vars (always override any other variable) -> host facts -> inventory variables -> role defaults (the weakest). Here is the order of precedence from least to greatest (the last listed variables winning prioritization): 1. command line values (eg “-u user”) 2. role defaults [[1\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id15) 3. inventory file or script group vars [[2\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id16) 4. inventory group_vars/all [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17) 5. playbook group_vars/all [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17) 6. inventory group_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17) 7. playbook group_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17) 8. inventory file or script host vars [[2\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id16) 9. inventory host_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17) 10. playbook host_vars/* [[3\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id17) 11. host facts / cached set_facts [[4\]](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#id18) 12. play vars 13. play vars_prompt 14. play vars_files 15. role vars (defined in role/vars/main.yml) 16. block vars (only for tasks in block) 17. task vars (only for the task) 18. include_vars 19. set_facts / registered vars 20. role (and include_role) params 21. include params 22. extra vars (always win precedence) A full list can be found at [PlayBook Variables](https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#ansible-variable-precedence) . Also, note there is a significant difference between Ansible 1.x and 2.x.
For each of the following statements determine if it's true or false: * A module is a collection of tasks * It’s better to use shell or command instead of a specific module * Host facts override play variables * A role might include the following: vars, meta, and handlers * Dynamic inventory is generated by extracting information from external sources * It’s a best practice to use indentation of 2 spaces instead of 4 * ‘notify’ used to trigger handlers * This “hosts: all:!controllers” means ‘run only on controllers group hosts
Explain the Difference between Forks and Serial & Throttle.
`Serial` is like running the playbook for each host in turn, waiting for completion of the complete playbook before moving on to the next host. `forks`=1 means run the first task in a play on one host before running the same task on the next host, so the first task will be run for each host before the next task is touched. Default fork is 5 in ansible. ``` [defaults] forks = 30 ``` ``` - hosts: webservers serial: 1 tasks: - name: ... ``` Ansible also supports `throttle` This keyword limits the number of workers up to the maximum set via the forks setting or serial. This can be useful in restricting tasks that may be CPU-intensive or interact with a rate-limiting API ``` tasks: - command: /path/to/cpu_intensive_command throttle: 1 ```
What is ansible-pull? How is it different from how ansible-playbook works?
What is Ansible Vault?
Demonstrate each of the following with Ansible: * Conditionals * Loops
What are filters? Do you have experience with writing filters?
Write a filter to capitalize a string
``` def cap(self, string): return string.capitalize() ```
You would like to run a task only if previous task changed anything. How would you achieve that?
What are callback plugins? What can you achieve by using callback plugins?
What is the difference between `include_task` and `import_task`?
File '/tmp/exercise' includes the following content ``` Goku = 9001 Vegeta = 5200 Trunks = 6000 Gotenks = 32 ``` With one task, switch the content to: ``` Goku = 9001 Vegeta = 250 Trunks = 40 Gotenks = 32 ```
``` - name: Change saiyans levels lineinfile: dest: /tmp/exercise regexp: "{{ item.regexp }}" line: "{{ item.line }}" with_items: - { regexp: '^Vegeta', line: 'Vegeta = 250' } - { regexp: '^Trunks', line: 'Trunks = 40' } ... ```
#### Ansible - Execution and Strategy
True or False? By default, Ansible will execute all the tasks in play on a single host before proceeding to the next host
False. Ansible will execute a single task on all hosts before moving to the next task in a play. As for today, it uses 5 forks by default.
This behavior is described as "strategy" in Ansible and it's configurable.
What is a "strategy" in Ansible? What is the default strategy?
A strategy in Ansible describes how Ansible will execute the different tasks on the hosts. By default Ansible is using the "Linear strategy" which defines that each task will run on all hosts before proceeding to the next task.
What strategies are you familiar with in Ansible?
- Linear: the default strategy in Ansible. Run each task on all hosts before proceeding. - Free: For each host, run all the tasks until the end of the play as soon as possible - Debug: Run tasks in an interactive way
What the serial keyword is used for?
It's used to specify the number (or percentage) of hosts to run the full play on, before moving to next number of hosts in the group. For example: ``` - name: Some play hosts: databases serial: 4 ``` If your group has 8 hosts. It will run the whole play on 4 hosts and then the same play on another 4 hosts.
#### Ansible Testing
How do you test your Ansible based projects?
What is Molecule? How does it works?
It's used to rapidy develop and test Ansbile roles. Molecule can be used to test Ansible roles against a varaitey of Linux Distros at the same time. This testing ability helps instill confidence of the automation today and as time go on while a role is maintined.
You run Ansible tests and you get "idempotence test failed". What does it mean? Why idempotence is important?
#### Ansible - Debugging
How to find out the data type of a certain variable in one of the playbooks?
"{{ some_var | type_debug }}"
#### Ansible - Collections
What are collections in Ansible?
Ansible Collections are a way to package and distribute modules, roles, plugins, and documentation in a structured format. They help organize and distribute automation code efficiently, especially for complex environments.
Why Use Ansible Collections?
- Modular and reusable components - Simplifies management of custom and third-party modules - Provides a standardized way to distribute automation content - Helps in version control and dependency management
================================================ FILE: topics/ansible/my_first_playbook.md ================================================ ## Ansible - My First Playbook 1. Write a playbook that will: a. Install the package zlib b. Create the file `/tmp/some_file` 2. Run the playbook on a remote host ================================================ FILE: topics/ansible/my_first_task.md ================================================ ## Ansible - My First Task 1. Write a task to create the directory ‘/tmp/new_directory’ ================================================ FILE: topics/ansible/solutions/my_first_playbook.md ================================================ ## My first playbook - Solution 1. `vi first_playbook.yml` ``` - name: Install zlib and create a file hosts: some_remote_host tasks: - name: Install zlib package: name: zlib state: present become: yes - name: Create the file /tmp/some_file file: path: '/tmp/some_file' state: touch ``` 2. First, edit the inventory file: `vi /etc/ansible/hosts` ``` [some_remote_host] some.remoted.host.com ``` Run the playbook `ansible-playbook first_playbook.yml` ================================================ FILE: topics/ansible/solutions/my_first_task.md ================================================ ## My First Task - Solution ``` - name: Create a new directory file: path: "/tmp/new_directory" state: directory ``` ================================================ FILE: topics/ansible/solutions/update_upgrade_task.md ================================================ ## Update and Upgrade apt packages task - Solution ``` - name: "update and upgrade apt packages." become: yes apt: upgrade: yes update_cache: yes ``` ================================================ FILE: topics/ansible/update_upgrade_task.md ================================================ ## Ansible - Update and upgrade APT packages task 1. Write a task to update and upgrade apt packages ================================================ FILE: topics/argo/README.md ================================================ # Argo - [Argo](#argo) - [ArgoCD Exercises](#argocd-exercises) - [ArgoCD 101](#argocd-101) - [ArgoCD Secrets](#argocd-secrets) - [ArgoCD Helm](#argocd-helm) - [Argo Rollouts](#argo-rollouts) - [ArgoCD Questions](#argocd-questions) - [ArgoCD 101](#argocd-101-1) - [Practical ArgoCD 101](#practical-argocd-101) - [CLI](#cli) - [ArgoCD Configuration](#argocd-configuration) - [Advanced ArgoCD](#advanced-argocd) - [ArgoCD Application Health](#argocd-application-health) - [ArgoCD Syncs](#argocd-syncs) - [ArgoCD and Helm](#argocd-and-helm) - [Argo Rollouts Questions](#argo-rollouts-questions) - [Argo Rollouts 101](#argo-rollouts-101) - [Argo Advanced Rollouts](#argo-advanced-rollouts) - [Argo Rollouts Commands](#argo-rollouts-commands) ## ArgoCD Exercises ### ArgoCD 101 |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Creating an App | App | [Exercise](exercises/app_creation/exercise.md) | [Solution](exercises/app_creation/solution.md) | Syncing App - Git | Sync | [Exercise](exercises/sync_app_git/exercise.md) | [Solution](exercises/sync_app_git/solution.md) | Syncing App - Cluster | Sync | [Exercise](exercises/sync_app_cluster/exercise.md) | [Solution](exercises/sync_app_cluster/solution.md) ### ArgoCD Secrets |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Secrets 101 | Secrets | [Exercise](exercises/secrets_101/exercise.md) | [Solution](exercises/secrets_101/solution.md) ### ArgoCD Helm |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Helm ArgoCD App | Secrets | [Exercise](exercises/argocd_helm_app/exercise.md) | [Solution](exercises/argocd_helm_app/solution.md) ### Argo Rollouts |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Blue/Green Rollout | Rollouts | [Exercise](exercises/blue_green_rollout/exercise.md) | [Solution](exercises/blue_green_rollout/solution.md) | Canary Rollout | Rollouts | [Exercise](exercises/canary_rollout/exercise.md) | [Solution](exercises/canary_rollout/solution.md) ## ArgoCD Questions ### ArgoCD 101
What is Argo CD?
[ArgoCD](https://argo-cd.readthedocs.io/en/stable): "Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes." As to why Argo CD, they provide the following explanation: "Application definitions, configurations, and environments should be declarative and version controlled. Application deployment and lifecycle management should be automated, auditable, and easy to understand."
There been a lot of CI/CD systems before ArgoCD (Jenkins, Teamcity, CircleCI, etc.) What added value ArgoCD brought?
Simply said, ArgoCD is CD, not CI. We still need CI systems. Secondly, ArgoCD is running on Kubernetes, it's part of its ecosystem, as opposed to some other CI/CD systems. Finally, ArgoCD was built specifically for Kubernetes, not other platforms and systems. Easier to explain the need for ArgoCD by direct comparison to another system that can do CD. Let's use Jenkins for this. With Jenkins, you need make sure to install k8s related tools and set access for commands like kubectl. With ArgoCD you simply need to install it in your namespace but no need to install additional tools as it's part of k8s. With Jenkins, managing access is usually done per pipeline and even if set globally in Jenkins, you still need to configure each pipeline to use that access configuration. With ArgoCD access management to k8s and other resources is given as it runs already on the cluster, in one or multiple namespaces. With Jenkins, tracking the status of what got deployed to k8s can be done only as an extra step, by running the pipeline. This is because Jenkins isn't part of the k8s cluster. With ArgoCD you get much better tracking and visibility of what gets deployed as it runs in the same cluster and the same namespace. With ArgoCD it's really easy to roll back to a previous version because all the changes done, are done to git which is a versioned source control. So it's enough to get to a previous commit for ArgoCD to detect a change and sync to the cluster. Worth to mention, this point specifically is true for Jenkins as well :)
Describe an example of workflow where ArgoCD is used
1. A developer submitted change to an application repository 2. Jenkins pipeline is triggered to run CI on the change 3. If the Jenkins Pipeline completed successfully, build an image out of the new code 4. Push to image to a registry 5. Update K8S manifest file(s) in a separate app config repository 6. ArgoCD tracks changes in the app config repository. Since there was a change in the repository, it will apply the changes from the repo
True or False? ArgoCD supports Kubernetes YAML files but not other manifests formats like Helm Charts and Kustomize
False. It supports Kubernetes YAML files as well as Helm Charts and Kustomize.
What "GitOps Repository" means in regards to ArgoCD?
It's the repository that holds app configuration, the one updated most of the time by CI/CD processes or DevOps, SRE engineers. In regards to ArgoCD it's the repository ArgoCD tracks for changes and apply them when they are detected.
What are the advantages in using GitOps approach/repository?
* Your whole configuration is one place, defined as code so it's completely transparent, adjustable for changes and easily reproducible * Everyone go through the same interface hence you have more people experiencing and testing the code, even if not intentionally * Engineers can use it for testing, development, ... there is no more running manual commands and hoping to reach the same status as in the cluster/cloud. * Single source of truth: you know that your GitOps is the repo from which changes can be done to the cluster. So even if someone tries to manually override it, it won't work.
Sorina, one of the engineers in your team, made manual changes to the cluster that override some of the configuration in a repo traced by ArgoCD. What will happen?
Once Sorina made the modifications, ArgoCD will detect the state diverged and will sync the changes from the GitOps repository, overwriting the manual changes done by Sorina.
Nate, one of the engineers in your organization, asked whether it's possible if ArgoCD didn't sync for changes done manually to the cluster. What would be your answer?
The answer is yes, it's possible. You can configure ArgoCD to sync to desired state when changes done manually and instead do something like sending alerts.
How cluster disaster recovery becomes easier with ArgoCD?
Imagine you have a cluster in the cloud, in one of the regions. Something happens to that cluster and it's either crashes or simply no longer operational. If you have all your cluster configuration in a GitOps repository, ArgoCD can be pointed to that repository while be configured to use a new cluster you've set up and apply that configuration so your cluster is again up and running with the same status as o
Ella, an engineer in your team, claims ArgoCD benefit is that it's an extension Kubernetes, it's part of the cluster. Sarah, also an engineer in your team, claims it's not a real benefit as Jenkins can be also deployed in the cluster hence being part of it. What's your take?
Ella is right, ArgoCD is an extension of the cluster, that is very different from simply being deployed in the cluster as other CI/CD systems like Jenkins. ArgoCD uses existing k8s resources like K8s controllers (for monitoring and state differences) and etcd for storing data.
How the main resource in ArgoCD called?
"Application"
Explain what is an "Application" in regards to ArgoCD
It's a custom resource definitions which responsible for the deployment and synchronization of application resources to a Kubernetes cluster.
How ArgoCD makes access management in the cluster easier?
Instead of creating Kubernetes resources, you can use Git to manage who is allowed to push code, to review it, merge it, etc - either human users or 3rd party systems and services. There is no need to use ClusterRole or User resources in Kubernetes hence the management of access is much more simplified.
### Practical ArgoCD 101
Describe the purpose of the following section in a an Application YAML file ```YAML source: repoURL: https://github.com/bregman-arie/devops-exercises targetRevision: HEAD path: main ```
This section of an Application in ArgoCD, defines which Git repository should be synced
Describe the purpose of the following section in a an Application YAML file ```YAML destination: server: http://some.kubernetes.cluster.svc namespace: devopsExercises ```
This section defines with which Kubernetes cluster the app in the tracked Git repository should be synced with.
What CRD would you use if you have multiple applications and you would like to group them together logically?
AddProject
True or False? ArgoCD sync period is 3 hours
False. ArgoCD sync period is 3 minutes as of today (and not hours).
Describe shortly what ArgoCD does every sync period
1. Gathers list of all the apps to sync (those that are marked with "auto-sync") 2. Gets Git state for each repository 3. Performs comparison between the repository Git state and the Kubernetes cluster state 1. If states are different, the application marked as "out-of-sync" and further action might be taken (based on the configuration) 2. If states are equal, the application marked as "synced"
You deployed a new application in a namespace called "yay" but when running kubectl get ns yay you see there is no such namespace. What happened?
Deploying applications in non-existing namespaces doesn't create the namespace. For that you have to explicitly mark "Auto-create namespace". To fix it, you can simply run `kubectl create namespace NAMESPACE_NAME` but it's better of course to have it stored in Git rather than running kubectl commands.
#### CLI
Create a new application with the following properties: * app name: some-app * repo: https://fake.repo.address * app path: ./app_path * namespace: default * cluster: my.kubernetes.cluster
``` argocd app create some-app \ --project \ --repo https://fake.repo.address \ --path ./app_path \ --dest-namespace default \ --dest-server my.kubernetes.cluster ```
List all argocd apps
`argocd app list`
Print detailed information on the app called "some-app"
`argocd app get some-app`
How to add an additional (external) cluster for ArgoCD to manage?
`argocd cluster add CLUSTER_ADDRESS/NAME`
How to list all the clusters ArgoCD manage?
`argocd cluster list`
### ArgoCD Configuration
Is it possible to change default sync period of ArgoCD?
Yes, it is possible by adding the following to the argocd-cm (ConfigMap): ``` data: timeout.reconciliation: 300s ``` The value can be any number of seconds you would like to set.
What will be the result of setting timeout.reconciliation: 0s?
sync functionality will be disabled.
### Advanced ArgoCD
What is the "App of Apps Patterns"?
A solution from Argo community in regards to managing multiple similar applications. Basically a pattern where you have root application that consists of other child applications. So instead of creating multiple separate applications, you have the root application pointing to a repository with additional applications.
Can you provide some use cases for using "App of Apps Patterns"?
* Cluster Preparation: You would like to deploy multiple applications at once to bootstrap a Kubernetes cluster * Multiple environments: If deploying many versions of the same application, but with minor changes. For example, several test deployments to test different features * Multiple clusters: when the same application needs to be deployed across multiple K8s clusters connected to ArgoCD
True or False? If you have multiple Kubernetes clusters you want to manage sync applications to with ArgoCD then, you must have ArgoCD installed on each one of them
False, it can be deployed on one of them. ArgoCD is able to manage external clusters on which it doesn't run.
You've three clusters - dev, staging and prod. Whenever you update the application GitOps repo, all three clusters are being updated. What's the problem with that and how to deal with it?
You don't usually want to go and update all of your clusters at once, especially when some for testing and development purposes and some for actual production usage. There are multiple ways to deal with it: 1. Branch driven: Have branches for your GitOps repo where you push first to development, do some testing, merge it then to staging and if everything works fine in staging, you merge it to production. 2. Use overlays and Kustomize to control the context of where your changes synced based on the CI process/pipeline used.
### ArgoCD Application Health
What are some possible health statuses for an ArgoCD application?
* Healthy * Missing: resource doesn't exist in the cluster * Suspended: resource is paused * Progressing: resources isn't healthy but will become healthy or has the chance to become healthy * Degraded: resource isn't healthy * Unknown: it's not known what's the app health
True or False? A Deployment considered to be healthy if the Pods are running
Not exactly. A Deployment (as well as StatefulSet, ReplicaSet and DaemonSet) considered healthy if the desired state equals to actual/current state (this includes the number of replicas).
True or False? An ingress is considered healthy if status.loadBalancer.ingress list includes at least one value
True.
What can you tell about the health of custom Kubernetes resources?
The health of custom Kubernetes resources is defined by writing Lua scripts. You find such list of scripts here: https://github.com/argoproj/argo-cd/tree/master/resource_customizations
### ArgoCD Syncs
Explain manual syncs vs. automatic syncs
Automatic syncs means that once ArgoCD detected a change or a new version of your app in Git, it will apply the changes so the current/actual state can be equal to desired state. With manual syncs, ArgoCD will identify there is a difference, but will do nothing to correct it.
Explain auto-pruning
If enabled, auto-pruning will remove resources when files or content is removed from a tracked Git repository. If disabled, ArgoCD will not remove anything, even when content or files are removed.
Explain self-heal in regards to ArgoCD
Self-heal is the process of correcting the cluster state based on the desired state, when someone makes manual changes to the cluster.
### ArgoCD and Helm
What support is provided in ArgoCD for Helm?
ArgoCD is able to track packaged Helm chart in a sense where it will monitor for new versions.
True or False? When ArgoCD tracks Helm chart the chart is no longer an Helm application and it's a ArgoCD app
True. Trying to execute commands like `helm ls` will fail because helm metadata doesn't exist anymore and the application is tracked as ArgoCD app.
## Argo Rollouts Questions ### Argo Rollouts 101
What is Argo Rollouts?
A controller for Kubernetes to perform application deployments using different strategies like Blue/Green deployments, Canary deployments, etc. In addition, it supports A/B tests, automatic rollbacks and integrated metric analysis.
What happens when you rollout a new version of your app with argo rollouts?
- Argo Rollouts creates a new replicaset (that is the new app version) - Old version is still alive - ArgoCD marks the app as out-of-sync
True or False? You need to install ArgoCD in order to use Argo Rollouts
False. Quite common misconception today but both cab be used independency even though they work nicely together.
### Argo Advanced Rollouts
Scott, an engineer in your team, executes manually some smoke tests and monitors rollouts every time a new version is deployed. This way, if there is an issue he detects, he performs a rollback. What better approach you might suggest him to take?
Shift towards fully automated rollbacks. Argo Rollouts supports multiple metric providers (Datadog, NewRelic, etc.) so you can use data and metrics for automating the rollbacks based on different conditions
Explain the concept of "Analysis" in regards to Argo Rollouts
Analysis is a resource deployed along a Rollout resources and defines the conditions and metrics threshols for performing a rollback
Explain the following configuration ```yaml apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: success-rate spec: args: - name: service-name metrics: - name: success-rate interval: 4m count: 3 successCondition: result[0] >= 0.90 provider: prometheus: address: http:/some-prometheus-instance:80 query: sum(response_status{app="{{args.service-name}}",role="canary",status=~"2.*"})/sum(response_status{app="{{args.service-name}}",role="canary"} ```
It's an Analysis resource that fetches response status from Prometheus (monitoring instance). If it's more than 0.90 the rollout will continue, if it's less than 0.90 a rollback will be performed meaning the canary deployment failed.
### Argo Rollouts Commands
How to list rollouts?
`kubectl argo rollouts list rollouts`
How to list the rollouts of a given application?
`kubectl argo rollouts get rollout SOME-APP`
How to check the status of a rollout?
`kubectl argo rollouts status SOME-APP`
How to rollout a new version (with new container tag)?
`kubectl argo rollouts set image SOME-APP web-app=some/registry/and/image:v2.0`
How to manually promote to new app version?
`kubectl argo rollouts promote SOME-APP`
How do you monitor a rollout?
`kubectl argo rollouts get rollout SOME-APP --watch`
================================================ FILE: topics/argo/exercises/app_creation/exercise.md ================================================ # App Creation ## Requirements 1. Make sure you have repository with some Kubernetes manifests 2. Make sure you have a Kubernetes cluster running with ArgoCD installed ## Objectives 1. Using the CLI or the UI, create a new application with the following properties: 1. app name: app-demo 2. project: app-project 3. repository URL: your repo with some k8s manifests 4. namespace: default 2. Verify the app was created 3. Sync the app 4. Verify Kubernetes resources were created 5. Delete the app ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/argo/exercises/app_creation/solution.md ================================================ # App Creation ## Requirements 1. Make sure you have repository with some Kubernetes manifests 2. Make sure you have a Kubernetes cluster running with ArgoCD installed ## Objectives 1. Using the CLI or the UI, create a new application with the following properties: 1. app name: app-demo 2. project: app-project 3. repository URL: your repo with some k8s manifests 4. namespace: default 2. Verify the app was created 3. Sync the app 4. Verify Kubernetes resources were created 5. Delete the app ## Solution ### UI 1. Click on "New App" 1. Insert application name: `app-demo` 2. Insert project: `app-project` 3. Under source put the repository URL to your GitHub repo with Kubernetes manifests 1. Set the path for your application 4. Under destination put the address of your Kubernetes cluster and set namespace to `default` 5. Click on "Create" 2. Click on "Sync" button on the "app-demo" form 1. Click on "Synchronize" 3. Verify the Kubernetes resources were created 1. `kubectl get deployments` 4. Delete the app ### CLI ``` argocd app create app-demo \ --project app-project \ --repo https://fake.repo.address \ --path ./some_app_path \ --dest-namespace default \ --dest-server my.kubernetes.cluster # Check app state argocd app list argocd app get app-demo # Sync app state argocd app sync app-demo argocd app wait app-demo # Verify kubernetes resources were created kubectl get deployments # Delete the app argocd app delete app-demo ``` ================================================ FILE: topics/argo/exercises/argocd_helm_app/exercise.md ================================================ # ArgoCD Helm App ## Requirements 1. Running Kubernetes cluster 2. ArgoCD installed on the k8s cluster 3. Repository of an Helm chart ## Objectives 1. Create a new app in ArgoCD that points to the repo of your Helm chart ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/argo/exercises/argocd_helm_app/solution.md ================================================ # ArgoCD Helm App ## Requirements 1. Running Kubernetes cluster 2. ArgoCD installed on the k8s cluster 3. Repository of an Helm chart ## Objectives 1. Create a new app in ArgoCD that points to the repo of your Helm chart ## Solution ``` argocd app create some-app \ --project default \ --repo https://repo-with-helm-chart --path "./helm" \ --sync-policy auto \ --dest-namespace default \ --dest-server https://kubernetes.cluster ``` ================================================ FILE: topics/argo/exercises/blue_green_rollout/exercise.md ================================================ # Argo Rollouts - Blue/Green ## Requirements 1. Running Kubernetes cluster 2. Argo Rollouts CLI 3. Deployed app in specific version ## Objectives 1. Install Argo Rollouts controller 2. Write a rollout manifest that use blue/green deployment and apply it 1. Set it to 3 replicas 2. Disable auto-promotions 3. Check the rollout list 4. Rollout a new version of your app in any way you prefer 1. Check the status of the rollout ## Solutions Click [here](solution.md) to view the solution. ================================================ FILE: topics/argo/exercises/blue_green_rollout/solution.md ================================================ # Argo Rollouts - Blue/Green ## Requirements 1. Running Kubernetes cluster 2. Argo Rollouts CLI 3. Deployed app in specific version ## Objectives 1. Install Argo Rollouts controller 2. Write a rollout manifest that use blue/green deployment and apply it 1. Set it to 3 replicas 2. Disable auto-promotions 3. Check the rollout list 4. Rollout a new version of your app in any way you prefer 1. Check the status of the rollout ## Solution Installation: 1. `kubectl create namespace argo-rollouts` 1. `kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml` 2. Rollout resource: ``` --- apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: some-app spec: replicas: 3 strategy: blueGreen: autoPromotionEnabled: false selector: matchLabels: app: some-web-app template: metadata: labels: app: some-web-app spec: containers: - name: web-app image: some/registry/and/image:v1.0 ports: - name: http containerPort: 8080 protocol: TCP ``` 3. `kubectl argo rollouts list rollouts` 4. `kubectl argo rollouts set image SOME-APP web-app=some/registry/and/image:v2.0` 1. `kubectl argo rollouts get rollout some-app --watch` ================================================ FILE: topics/argo/exercises/canary_rollout/exercise.md ================================================ # Argo Rollouts - Canary ## Requirements 1. Running Kubernetes cluster 2. Argo Rollouts CLI 3. Deployed app in a specific version ## Objectives 1. Install Argo Rollouts controller 2. Write a rollout manifest that use canary rollout strategy and apply it 1. Set it to 3 replicas 2. Disable auto-promotions 3. Check the rollout list 4. Rollout a new version of your app in any way you prefer 1. Check the status of the rollout ## Solutions Click [here](solution.md) to view the solution. ================================================ FILE: topics/argo/exercises/canary_rollout/solution.md ================================================ # Argo Rollouts - Canary ## Requirements 1. Running Kubernetes cluster 2. Argo Rollouts CLI 3. Deployed app in a specific version ## Objectives 1. Install Argo Rollouts controller 2. Write a rollout manifest that use canary rollout strategy and apply it 1. Set it to 6 replicas 2. Disable auto-promotions 3. Check the rollout list 4. Rollout a new version of your app in any way you prefer 1. Check the status of the rollout ## Solution Installation: 1. `kubectl create namespace argo-rollouts` 1. `kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml` 2. Rollout resource: ``` --- apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: some-app spec: replicas: 6 strategy: canary: stableService: k8s-service-stable canaryService: k8s-service-canary trafficRouting: ambassador: mappings: - k8s-mapping steps: - setWeight: 30 - pause: {} - setWeight: 60 - pause: {} - setWeight: 100 - pause: {} selector: matchLabels: app: some-web-app template: metadata: labels: app: some-web-app spec: containers: - name: web-app image: some/registry/and/image:v1.0 ports: - name: http containerPort: 8080 protocol: TCP ``` 3. `kubectl argo rollouts list rollouts` 4. `kubectl argo rollouts set image SOME-APP web-app=some/registry/and/image:v2.0` 1. `kubectl argo rollouts get rollout some-app --watch` ================================================ FILE: topics/argo/exercises/secrets_101/exercise.md ================================================ # ArgoCD Secrets 101 ## Requirements 1. Running Kubernetes cluster 2. Application k8s manifests with secrets 3. Kubeseal binary installed ## Objectives 1. Install bitnami sealed controller as ArgoCD app 2. Encrypt secrets and commit them to the repo with the k8s manifests 3. Create an app using the secrets you encrypted ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/argo/exercises/secrets_101/solution.md ================================================ # ArgoCD Secrets 101 ## Requirements 1. Running Kubernetes cluster 2. Application k8s manifests with secrets 3. Kubeseal binary installed ## Objectives 1. Install bitnami sealed controller as ArgoCD app 2. Encrypt secrets and commit them to the repo with the k8s manifests 3. Create an app using the secrets you encrypted ## Solution 1. Click on "New App" 1. app name: controller 2. project: default 3. sync policy: automatic 4. repository URL: a URL to bitnami sealed controller manifests 5. namespace: kube-system 2. Run the following for every secret: `kubeseal < some/secret.yml > sealed_secrets/some/encrypted_secret.yaml -o yaml` 3. Click on "New App" 1. app name: some-app 2. project: default 3. sync policy: automatic 4. repository URL: a URL to k8s manifests (including encrypted secrets) 5. namespace: default ================================================ FILE: topics/argo/exercises/sync_app_cluster/exercise.md ================================================ # Sync App - Cluster ## Requirements 1. Make sure you have a Kubernetes cluster running with ArgoCD installed 1. Make sure you have an app deployed in the cluster and tracked by ArgoCD ## Objectives 1. Verify the app is tracked by ArgoCD and in sync 2. . Make a change to your application by running a `kubectl` command. The change can anything: 1. Changing the tag of the image 2. Changing the number of replicas 3. You can go extreme and delete the resource if you would like :) 3. Check the app state in ArgoCD 4. Sync the app state ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/argo/exercises/sync_app_cluster/solution.md ================================================ # Sync App - Cluster ## Requirements 1. Make sure you have a Kubernetes cluster running with ArgoCD installed 1. Make sure you have an app deployed in the cluster and tracked by ArgoCD ## Objectives 1. Verify the app is tracked by ArgoCD and in sync 2. . Make a change to your application by running a `kubectl` command. The change can anything: 1. Changing the tag of the image 2. Changing the number of replicas 3. You can go extreme and delete the resource if you would like :) 3. Check the app state in ArgoCD 4. Sync the app state ## Solution ### UI 1. Click on the app in the UI 1. Make sure it's in sync and in "healthy" state 2. Make a check in the cluster 1. `kubectl scale --replicas=0 ` 2. `kubectl get rs ` 3. Go back to the UI and check the state of the app 1. If it's still in sync, make sure to click on "Refresh" 2. The app should be in "out-of-sync" state 3. Click on "Sync" and then on "Synchronize" ### CLI ``` # Check app state and verify it's in sync argocd app get app-demo # Run the following k8s commands (or any other commands that will change the state of your app) kubectl scale --replicas=0 kubectl get rs # Check app state again argocd app get app-demo # Sync app state argocd app sync app-demo argocd app wait app-demo ``` ================================================ FILE: topics/argo/exercises/sync_app_git/exercise.md ================================================ # Sync App - Git ## Requirements 1. Make sure you have repository with some Kubernetes manifests 2. Make sure you have a Kubernetes cluster running with ArgoCD installed ## Objectives 1. Create a new application using the UI or CLI 1. App Name: app-demo 2. Project: project-demo 3. Kubernetes namespace: default 2. Sync the application 3. Verify the application is running by executing `kubectl get deploy` in the `default` namespace 4. Now make a change in your repository to one of the Kubernetes manifests (e.g. update deployment tag) 5. Go back to ArgoCD and check the state of the app 6. Sync the state of the application ## Solution Click [here](solution.md) to view the solution ### UI 1. Click on "New App" 1. Insert application name: `app-demo` 2. Insert project: `project-demo` 3. Under source put the repository URL to your GitHub repo with Kubernetes manifests 1. Set path of your application 4. Under destination put the address of your Kubernetes cluster and set namespace to `default` 5. Click on "Create" 2. Click on the newly created application 1. Click on the "sync button" and click on "Synchronize" 3. Make a change in your Git repo where the Kubernetes manifests are 1. `git add .` 2. `git commit -a` 3. `git push origin ` 4. Go back to ArgoCD UI and check the status of the app 1. You should see it's "out-of-sync". If you don't, you may want to click on "Refresh" 2. You can also click on "App diff" to see the difference that led to "out-of-sync" 5. Click on "Sync" and "Synchronize" to sync the application ### CLI ``` argocd app create app-demo \ --project project-demo \ --repo https://fake.repo.address \ --path ./some_app_path \ --dest-namespace default \ --dest-server my.kubernetes.cluster # In the Git repo cd vi git add . git commit -a git push origin # Check app state argocd app get app-demo # Sync app state argocd app sync app-demo argocd app wait app-demo ``` ================================================ FILE: topics/argo/exercises/sync_app_git/solution.md ================================================ # Sync App - Git ## Requirements 1. Make sure you have repository with some Kubernetes manifests 2. Make sure you have a Kubernetes cluster running with ArgoCD installed ## Objectives 1. Create a new application using the UI or CLI 1. App Name: app-demo 2. Project: project-demo 3. Kubernetes namespace: default 2. Sync the application 3. Verify the application is running by executing `kubectl get deploy` in the `default` namespace 4. Now make a change in your repository to one of the Kubernetes manifests (e.g. update deployment tag) 5. Go back to ArgoCD and check the state of the app 6. Sync the state of the application ## Solution ### UI 1. Click on "New App" 1. Insert application name: `app-demo` 2. Insert project: `project-demo` 3. Under source put the repository URL to your GitHub repo with Kubernetes manifests 1. Set path of your application 4. Under destination put the address of your Kubernetes cluster and set namespace to `default` 5. Click on "Create" 2. Click on the newly created application 1. Click on the "sync button" and click on "Synchronize" 3. Make a change in your Git repo where the Kubernetes manifests are 1. `git add .` 2. `git commit -a` 3. `git push origin ` 4. Go back to ArgoCD UI and check the status of the app 1. You should see it's "out-of-sync". If you don't, you may want to click on "Refresh" 2. You can also click on "App diff" to see the difference that led to "out-of-sync" 5. Click on "Sync" and "Synchronize" to sync the application ### CLI ``` argocd app create app-demo \ --project project-demo \ --repo https://fake.repo.address \ --path ./some_app_path \ --dest-namespace default \ --dest-server my.kubernetes.cluster # In the Git repo cd vi git add . git commit -a git push origin # Check app state argocd app get app-demo # Sync app state argocd app sync app-demo argocd app wait app-demo ``` ================================================ FILE: topics/aws/README.md ================================================ # AWS **Note**: Some of the exercises cost $$$ and can't be performed using the free tier or resources **2nd Note**: The provided solutions are using the AWS console. It's recommended you use IaC technologies to solve the exercises (e.g., Terraform, Pulumi).
- [AWS](#aws) - [Exercises](#exercises) - [IAM](#iam) - [EC2](#ec2) - [S3](#s3) - [ELB](#elb) - [Auto Scaling Groups] (#auto-scaling-groups) - [VPC](#vpc) - [Databases](#databases) - [DNS](#dns) - [Containers](#containers) - [Lambda](#lambda) - [Elastic Beanstalk](#elastic-beanstalk) - [CodePipeline](#codepipeline) - [CDK](#cdk) - [Misc](#misc) - [Questions](#questions) - [Global Infrastructure](#global-infrastructure) - [IAM](#iam-1) - [EC2](#ec2-1) - [AMI](#ami) - [EBS](#ebs) - [Instance Store](#instance-store) - [EFS](#efs) - [Pricing Models](#pricing-models) - [Launch Template](#launch-template) - [ENI](#eni) - [Placement Groups](#placement-groups) - [VPC](#vpc-1) - [Default VPC](#default-vpc) - [Lambda](#lambda-1) - [Containers](#containers-1) - [ECS](#ecs) - [Fargate](#fargate) - [S3](#s3-1) - [Basics](#basics) - [Buckets 101](#buckets-101) - [Objects](#objects) - [S3 Security](#s3-security) - [Misc](#misc-1) - [Disaster Recovery](#disaster-recovery) - [CloudFront](#cloudfront) - [ELB](#elb-1) - [NLB](#nlb) - [ALB](#alb) - [Auto Scaling Group](#auto-scaling-group) - [Security](#security) - [Databases](#databases-1) - [RDS](#rds) - [Aurora](#aurora) - [DynamoDB](#dynamodb) - [ElastiCache](#elasticache) - [RedShift](#redshift) - [Identify the Service](#identify-the-service) - [DNS (Route 53)](#dns-route-53) - [SQS](#sqs) - [SNS](#sns) - [Monitoring and Logging](#monitoring-and-logging) - [Billing and Support](#billing-and-support) - [AWS Organizations](#aws-organizations) - [Automation](#automation) - [Misc](#misc-2) - [High Availability](#high-availability) - [Production Operations and Migrations](#production-operations-and-migrations) - [Scenarios](#scenarios) - [Architecture Design](#architecture-design) - [Misc](#misc-3) ## Exercises ### IAM |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Create a User | IAM | [Exercise](exercises/create_user/exercise.md) | [Solution](exercises/create_user/solution.md) | | | Password Policy | IAM | [Exercise](exercises/password_policy_and_mfa/exercise.md) | [Solution](exercises/password_policy_and_mfa/solution.md) | | | Create a role | IAM | [Exercise](exercises/create_role/exercise.md) | [Solution](exercises/create_role/solution.md) | | | Credential Report | IAM | [Exercise](exercises/credential_report/exercise.md) | [Solution](exercises/credential_report/solution.md) | | | Access Advisor | IAM | [Exercise](exercises/access_advisor/exercise.md) | [Solution](exercises/access_advisor/solution.md) | | ### EC2 |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Launch EC2 web instance | EC2 | [Exercise](exercises/launch_ec2_web_instance/exercise.md) | [Solution](exercises/launch_ec2_web_instance/solution.md) | | | Security Groups | EC2 | [Exercise](exercises/security_groups/exercise.md) | [Solution](exercises/security_groups/solution.md) | | | IAM Roles | EC2, IAM | [Exercise](exercises/ec2_iam_roles/exercise.md) | [Solution](exercises/ec2_iam_roles/solution.md) | | | Spot Instances | EC2 | [Exercise](exercises/create_spot_instances/exercise.md) | [Solution](exercises/create_spot_instances/solution.md) | | | Elastic IP | EC2, Networking | [Exercise](exercises/elastic_ip/exercise.md) | [Solution](exercises/elastic_ip/solution.md) | | | Placement Groups Creation | EC2, Placement Groups | [Exercise](exercises/placement_groups/exercise.md) | [Solution](exercises/placement_groups/solution.md) | | | Elastic Network Interfaces | EC2, ENI | [Exercise](exercises/elastic_network_interfaces/exercise.md) | [Solution](exercises/elastic_network_interfaces/solution.md) | | | Hibernate an Instance | EC2 | [Exercise](exercises/hibernate_instance.md) | [Solution](exercises/hibernate_instance/solution.md) | | | Volume Creation | EC2, EBS | [Exercise](exercises/ebs_volume_creation/exercise.md) | [Solution](exercises/ebs_volume_creation/solution.md) | | | Snapshots | EC2, EBS | [Exercise](exercises/snapshots/exercise.md) | [Solution](exercises/snapshots/solution.md) | | | Create an AMI | EC2, AMI | [Exercise](exercises/create_ami/exercise.md) | [Solution](exercises/create_ami/solution.md) | | | Create EFS | EC2, EFS | [Exercise](exercises/create_efs/exercise.md) | [Solution](exercises/create_efs/solution.md) | | ### S3 |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Create buckets | S3 | [Exercise](exercises/s3/new_bucket/exercise.md) | [Solution](exercises/s3/new_bucket/solution.md) | Bucket Lifecycle Policy | S3, Lifecycle Policy | | ### ELB |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Application Load Balancer | ELB, ALB | [Exercise](exercises/app_load_balancer/exercise.md) | [Solution](exercises/app_load_balancer/solution.md) | | | Multiple Target Groups | ELB, ALB | [Exercise](exercises/alb_multiple_target_groups/exercise.md) | [Solution](exercises/alb_multiple_target_groups/solution.md) | | | Network Load Balancer | ELB, NLB | [Exercise](exercises/network_load_balancer/exercise.md) | [Solution](exercises/network_load_balancer/solution.md) | | ### Auto Scaling Groups |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Auto Scaling Groups Basics | ASG | [Exercise](exercises/auto_scaling_groups_basics/exercise.md) | [Solution](exercises/auto_scaling_groups_basics/solution.md) | | | Dynamic Scaling Policy | ASG, Policies | [Exercise](exercises/asg_dynamic_scaling_policy/exercise.md) | [Solution](exercises/asg_dynamic_scaling_policy/solution.md) | | ### VPC |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | My First VPC | VPC | [Exercise](exercises/new_vpc/exercise.md) | [Solution](exercises/new_vpc/solution.md) | | | Subnets | VPC | [Exercise](exercises/subnets/exercise.md) | [Solution](exercises/subnets/solution.md) | | ### Databases |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | MySQL DB | RDS | [Exercise](exercises/mysql_db/exercise.md) | [Solution](exercises/mysql_db/solution.md) | | | Aurora DB | RDS | [Exercise](exercises/aurora_db/exercise.md) | [Solution](exercises/aurora_db/solution.md) | | | ElastiCache | ElastiCache | [Exercise](exercises/elasticache/exercise.md) | [Solution](exercises/elasticache/solution.md) | | ### DNS |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| Register Domain | Route 53 | [Exercise](exercises/register_domain/exercise.md) | [Solution](exercises/register_domain/solution.md) | | Creating Records | Route 53 | [Exercise](exercises/creating_records/exercise.md) | [Solution](exercises/creating_records/solution.md) | | Health Checks | Route 53 | [Exercise](exercises/health_checks/exercise.md) | [Solution](exercises/health_checks/solution.md) | | Failover | Route 53 | [Exercise](exercises/route_53_failover/exercise.md) | [Solution](exercises/route_53_failover/solution.md) | | ### Containers |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | ECS Task | ECS, Fargate | [Exercise](exercises/ecs_task/exercise.md) | [Solution](exercises/ecs_task/solution.md) | | ### Lambda |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Hello Function | Lambda | [Exercise](exercises/hello_function/exercise.md) | [Solution](exercises/hello_function/solution.md) | | | URL Function | Lambda | [Exercise](exercises/url_function/exercise.md) | [Solution](exercises/url_function/solution.md) | | | Web App with DB | Lambda, DynamoDB | [Exercise](exercises/web_app_dynamodb/exercise.md) | [Solution](exercises/web_app_dynamodb/solution.md) | | ### Elastic Beanstalk |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Simple Elastic Beanstalk Node.js app | Elastic Beanstalk | [Exercise](exercises/elastic_beanstalk_simple/exercise.md) | [Solution](exercises/elastic_beanstalk_simple/solution.md) | | ### CodePipeline |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Basic CI with S3 | CodePipeline & S3 | [Exercise](exercises/basic_s3_ci/exercise.md) | [Solution](exercises/basic_s3_ci/solution.md) | | ### CDK |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Sample CDK | CDK | [Exercise](exercises/sample_cdk/exercise.md) | [Solution](exercises/sample_cdk/solution.md) | | ### Misc |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Budget Setup | Budget | [Exercise](exercises/budget_setup/exercise.md) | [Solution](exercises/budget_setup/solution.md) | | | No Application :'( | Troubleshooting | [Exercise](exercises/no_application/exercise.md) | [Solution](exercises/no_application/solution.md) | | ## Questions ### Global Infrastructure
Explain the following * Availability zone * Region * Edge location
AWS regions are data centers hosted across different geographical locations worldwide.
Within each region, there are multiple isolated locations known as Availability Zones. Each availability zone is one or more data-centers with redundant network and connectivity and power supply. Multiple availability zones ensure high availability in case one of them goes down.
Edge locations are basically content delivery network which caches data and insures lower latency and faster delivery to the users in any location. They are located in major cities in the world.
True or False? Each AWS region is designed to be completely isolated from the other AWS regions
True.
True or False? Each region has a minimum number of 1 availability zones and the maximum is 4
False. The minimum is 2 while the maximum is 6.
What considerations to take when choosing an AWS region for running a new application?
* Services Availability: not all service (and all their features) are available in every region * Reduced latency: deploy application in a region that is close to customers * Compliance: some countries have more strict rules and requirements such as making sure the data stays within the borders of the country or the region. In that case, only specific region can be used for running the application * Pricing: the pricing might not be consistent across regions so, the price for the same service in different regions might be different.
### IAM
What is IAM? What are some of its features?
In short, it's used for managing users, groups, access policies & roles Full explanation can be found [here](https://aws.amazon.com/iam)
True or False? IAM configuration is defined globally and not per region
True
True or False? When creating an AWS account, root account is created by default. This is the recommended account to use and share in your organization
False. Instead of using the root account, you should be creating users and use them.
True or False? Groups in AWS IAM, can contain only users and not other groups
True
True or False? Users in AWS IAM, can belong only to a single group
False. Users can belong to multiple groups.
What are some best practices regarding IAM in AWS?
* Delete root account access keys and don't use root account regularly * Create IAM user for any physical user. Don't share users. * Apply "least privilege principle": give users only the permissions they need, nothing more than that. * Set up MFA and consider enforcing using it * Make use of groups to assign permissions ( user -> group -> permissions )
What permissions does a new user have?
Only a login access.
True or False? If a user in AWS is using password for authenticating, he doesn't needs to enable MFA
False(!). MFA is a great additional security layer to use for authentication.
What ways are there to access AWS?
* AWS Management Console * AWS CLI * AWS SDK
What are Roles?
[AWS docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html): "An IAM role is an IAM identity that you can create in your account that has specific permissions...it is an AWS identity with permission policies that determine what the identity can and cannot do in AWS." For example, you can make use of a role which allows EC2 service to access s3 buckets (read and write).
What are Policies?
Policies documents used to give permissions as to what a user, group or role are able to do. Their format is JSON.
A user is unable to access an s3 bucket. What might be the problem?
There can be several reasons for that. One of them is lack of policy. To solve that, the admin has to attach the user with a policy what allows him to access the s3 bucket.
What should you use to: - Grant access between two services/resources? - Grant user access to resources/services?
* Role * Policy
What statements AWS IAM policies are consist of?
* Sid: identifier of the statement (optional) * Effect: allow or deny access * Action: list of actions (to deny or allow) * Resource: a list of resources to which the actions are applied * Principal: role or account or user to which to apply the policy * Condition: conditions to determine when the policy is applied (optional)
Explain the following policy: ``` { "Version": "2012-10-17", "Statement": [ { "Effect:": "Allow", "Action": "*", "Resources": "*" } ] } ```
This policy permits to perform any action on any resource. It happens to be the "AdministratorAccess" policy.
What security tools AWS IAM provides?
* IAM Credentials Report: lists all the account users and the status of their credentials * IAM Access Advisor: Shows service permissions granted to a user and information on when he accessed these services the last time
Which tool would you use to optimize user permissions by identifying which services he doesn't regularly (or at all) access?
IAM Access Advisor
What type of IAM object would you use to allow inter-service communication?
Role
### EC2
What is EC2?
"a web service that provides secure, resizable compute capacity in the cloud". Read more [here](https://aws.amazon.com/ec2)
True or False? EC2 is a regional service
True. As opposed to IAM for example, which is a global service, EC2 is a regional service.
What are some of the properties/configuration options of EC2 instances that can be set or modified?
* OS (Linux, Windows) * RAM and CPU * Networking - IP, Card properties like speed * Storage Space - (EBS, EFS, EC2 Instance Store) * EC2 User Data * Security groups
What would you use for customizing EC2 instances? As in software installation, OS configuration, etc.
AMI. With AMI (Amazon Machine Image) you can customize EC2 instances by specifying which software to install, what OS changes should be applied, etc.
#### AMI
What is AMI?
Amazon Machine Images is "An Amazon Machine Image (AMI) provides the information required to launch an instance". Read more [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html)
What are the different sources for AMIs?
* Personal AMIs - AMIs you create * AWS Marketplace for AMIs - AMIs made by others, mostly sold for some price * Public AMIs - Provided by AWS
True or False? AMI are built for specific region
True (but they can be copied from one region to another).
Describe in high-level the process of creating AMIs
1. Start an EC2 instance 2. Customized the EC2 instance (install packages, change OS configuration, etc.) 3. Stop the instance (for avoiding data integrity issues) 4. Create EBS snapshot and build an AMI 5. To verify and test the AMI, launch an instance from the AMI
What is an instance type?
"the instance type that you specify determines the hardware of the host computer used for your instance" Read more about instance types [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html)
Explain the instance type naming convention
Let's take for example the following instance type: m5.large `m` is the instance class `5` is the generation `large` is the size of the instance (affects the spec properties like vCPUs and RAM)
True or False? The following are instance types available for a user in AWS: * Compute optimized * Network optimized * Web optimized
False. From the above list only compute optimized is available.
Explain each of the following instance types: * "Compute Optimized" * "Memory Optimized" * "Storage Optimized"
Compute Optimized: * Used for compute-intensive tasks * It has high performance processors * Use cases vary: gaming serves, machine learning, batch processing, etc. Memory Optimized: * Used for processing large data sets in memory * Other use cases: high performance, databases, distributed cache stores Storage Optimized: * Used for storage intensive tasks - high read and write access to large data sets * Use cases: databases, OLTP system, distributing file systems
What can you attach to an EC2 instance in order to store data?
EBS
#### EBS
Explain Amazon EBS
[AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html): "provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices."
What happens to EBS volumes when the instance is terminated?
By default, the root volume is marked for deletion, while other volumes will still remain.
You can control what will happen to every volume upon termination.
What happens to the EC2 disk (EBS) when the instance is stopped?
Disk is intact and can be used when the instance starts.
True or False? EBS volumes are locked to a specific availability zone
True
Explain EBS Snapshots
EBS snapshots used for making a backup of the EBS volume at point of time.
What are the use cases for using EBS snapshots?
* Backups of the data * Moving the data between AZs
Is it possible to attach the same EBS volume to multiple EC2 instances?
Yes, with multi-attach it's possible to attach a single EBS volume to multiple instances.
True or False? EBS is a network drive hence, it requires network connectivity
True
What EBS volume types are there?
* HDD (st 1, sc 1): Low cost HDD volumes * SSD * io1, io2: Highest performance SSD * gp2, gp3: General purpose SSD
If you need an EBS volume for low latency workloads, which volume type would you use?
SSD - io1, io2
If you need an EBS volume for workloads that require good performance but the cost is also an important aspect for you, which volume type would you use?
SSD - gp2, gp3
If you need an EBS volume for high-throughput, which volume type would you use?
SSD - io1, io2
If you need an EBS volume for infrequently data access, which volume type would you use?
HDD - sc1
Which EBS volume types can be used as boot volumes for EC2 instances?
SSD: gp2, gp3, io1, io2
True or False? In EBS gp2 volume type, IP will increase if the disk size increases
True.
#### Instance Store
If you would like to have an hardware disk attached to your EC2 instead of a network one (EBS). What would you use?
EC2 Instance Store.
Explain EC2 Instance Store. Why would someone choose to use it over other options?
EC2 instance store provides better I/O performances when compared to EBS.
It is mostly used for cache and temporary data purposes.
Are there any disadvantages in using instance store over EBS?
Yes, the data on instance store is lost when they are stopped.
#### EFS
What is Amazon EFS?
[AWS Docs](https://aws.amazon.com/efs): "Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources." In simpler words, it's a network file system you can mount on one or more EC2 instances.
True or False? EFS is locked into a single availability zone
False. EFS can be mounted across multiple availability zones.
What are some use cases for using EFS?
* Data sharing (e.g. developers working on the same source control) * Web serving * Content management
True or False? EFS only compatible with Linux based AMI
True
True or False? EFS requires the user to perform capacity planning as it doesn't scales automatically
False. EFS scales automatically and you pay-per-use.
What EFS modes are there?
* Performance mode * General purpose: used mainly for CMS, web serving, ... as it's optimal for latency sensitive applications * Max I/O: great for scaling to high levels of throughput and I/O operations per second * Throughput mode * Bursting: scale throughput based on FS size * Provisioned: fixed throughput
Which EFS mode would you use if you need to perform media processing?
Performance Mode (Max I/O): It provides high throughput and scales to operations per second. Mainly used for big data, media processing, etc.
What is the default EFS mode?
Performance Mode (General Purpose): Used for web serving, CMS, ... anything that is sensitive to latency.
What EFS storage tiers are there?
* Standard: frequently accessed files * Infrequent access: lower prices to store files but it also costs to retrieve them
#### Pricing Models
What EC2 pricing models are there?
On Demand - pay a fixed rate by the hour/second with no commitment. You can provision and terminate it at any given time. Reserved - you get capacity reservation, basically purchase an instance for a fixed time of period. The longer, the cheaper. Spot - Enables you to bid whatever price you want for instances or pay the spot price. Dedicated Hosts - physical EC2 server dedicated for your use.
True or False? Reserved instance has to be used for a minimum of 1 year
True.
Explain the following types of reserved instances: * Convertible Reserved Instances * Scheduled Reserved Instances
* Convertible Reserved Instances: used for long running workloads but used when instance type might change during the period of time it's reserved * Scheduled Reserved Instances: when you need to reserve an instance for a long period but you don't need it continuously (so for example you need it only in the morning)
True or False? In EC2 On Demand, you pay per hour when using Linux or Windows and per second (after first minute) when using any other operating system
False. You pay per second (after the first minute) when using Windows or Linux and per hour for any other OS.
You need an instance for short-term and the workload running on instance must not be interrupted. Which pricing model would you use?
On Demand is good for short-term non-interrupted workloads (but it also has the highest cost).
You need an instance for running an application for a period of 2 years continuously, without changing instance type. Which pricing model would you use?
Reserved instances: they are cheaper than on-demand and the instance is yours for the chosen period of time.
Which pricing model has potentially the biggest discount and what its advantage
Spot instances provide the biggest discount but has the disadvantage of risking losing them due bigger bid price.
You need an instance for two years, but only between 10:00-15:00 every day. Which pricing model would you use?
Reserved instances from the "Scheduled Reserved Instances" type which allows you to reserve for specific time window (like 10:00-15:00 every day).
You need an instance for running workloads. You don't care if they fail for a given moment as long as they run eventually. Which pricing model would you use?
Spot instances. The discount potential is the highest compared to all other pricing models. The disadvantage is that you can lose the instance at any point so, you must run only workloads that you are fine with them failing suddenly.
You need a physical server only for your use. Which pricing model are you going to use?
EC2 Dedicated Host
What are some of the differences between dedicated hosts and dedicated instances?
In dedicated hosts you have per host billing, you have more visibility (sockets, cores, ...) and you can control where instance will be placed.
In dedicated instances the billing is per instance but you can't control placement and you don't have visibility of sockets, cores, ...
For what use cases, EC2 dedicated hosts are useful for?
* Compliance needs * When the software license is complex (Bring Your Own License) and doesn't support cloud or multi-tenants * Regulatory requirements
What are Security Groups?
"A security group acts as a virtual firewall that controls the traffic for one or more instances" More on this subject [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html)
True or False? Security groups only contain deny rules
False. Security groups only contain allow rules.
True or False? One security group can be attached to multiple instances
True
True or False? Security groups are not locked down to a region and VPC (meaning you don't have to create a new one when switching regions)
False. They are locked down to regions and VPC.
True or False? By default, when using security groups, all inbound traffic to an EC2 instance is blocked and all outbound traffic is allowed
True
What is the advantage of referencing security groups from a given security group?
Imagine you have an instance referencing two security groups, allowing to get inbound traffic from them.
Now imagine you have two instances, each using one of the security groups referenced in the instance we've just mentioned. This means you can get traffic from these two instances because they use security groups which referenced in the instance mentioned at the beginning. No need to use IPs.
How to migrate an instance to another availability zone?
What can you attach to an EC2 instance in order to store data?
EBS
What EC2 reserved instance types are there?
Standard RI - most significant discount + suited for steady-state usage Convertible RI - discount + change attribute of RI + suited for steady-state usage Scheduled RI - launch within time windows you reserve Learn more about EC2 RI [here](https://aws.amazon.com/ec2/pricing/reserved-instances)
For how long can reserved instances be reserved?
1 or 3 years.
What allows you to control inbound and outbound instance traffic?
Security Groups
What bootstrapping means and how to use it in AWS EC2?
Bootstrapping is about launching commands when a machine starts for the first time. In AWS EC2 this is done using the EC2 user data script.
You get time out when trying reach your application which runs on an EC2 instance. Specify one reason why it would possibly happen
Security group isn't configured properly.
What is the AWS Instance Connect?
[AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Connect-using-EC2-Instance-Connect.html): "Amazon EC2 Instance Connect provides a simple and secure way to connect to your Linux instances using Secure Shell (SSH)."
You try to run EC2 commands in an EC2 instance you've just created but it fails due to missing credentials. What would you do?
DO NOT configure AWS credentials on the instance (this means anyone else in your account would be able to use and see your credentials).
The best practice is to attach an IAM role with sufficient permissions (like `IAMReadOnlyAccess`)
True or False? Cancelling a Spot instance request terminates the instance
False. When you cancel a Spot instance request, you are not terminating the instances created by it.
To terminate such instances, you must cancel the Spot instance request first.
What are Spot Fleets?
Set of Spot instances and if you would like, also on-demand instances.
What strategies are there to allocate Spot instances?
* lowestPrice: launch instances from the pool that has the lowest price * diversified: distributed across all pools * capacityOptimized: optimized based on the number of instances
From networking perspective, what do you get by default when running an EC2 instance?
A private IP and a public IP.
Explain EC2 hibernate
[AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Hibernate.html: "Hibernation saves the contents from the instance memory (RAM) to your Amazon Elastic Block Store (Amazon EBS) root volume."
True or False? Using EC2 hibernate option results in having faster instance boot
True. This is because the operating system isn't restarted or stopped.
What are some use cases for using EC2 hibernate option?
* Save RAM state * Service with long time initialization * Keep long-running processes
What are some limitations of EC2 hibernate option?
* Instance RAM size is limited * Root volume must be encrypted EBS * Hibernation time is limited * Doesn't supports all instances types * No support for bare metal. Only On-Demand and Reserved instances * Doesn't supports all AMIs
Explain what is EC2 Nitro
* Next generation EC2 instances using new virtualization technology * Better EBS: 64,000 EBS IOPS * Better networking: HPC, IPv6 * Better security
What CPU customization is available with EC2?
* Modifying number of CPU cores (useful for high RAM and low CPU applications) * Modifying number of threads per cure (useful for HPC workloads)
Explain EC2 Capacity Reservations
* Allows you to ensure you have EC2 capacity when you need it * Usually combined with Reserved Instances and Saving Plans to achieve cost saving
#### Launch Template
What is a launch template?
[AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html): "You can create a launch template that contains the configuration information to launch an instance. You can use launch templates to store launch parameters so that you do not have to specify them every time you launch an instance"
What is the difference between Launch Configuration and Launch Template?
Launch configuration is a legacy form of Launch Template that must be recreated every time you would like to update the configuration. In addition, launch template has the clear benefits of: * Provision both On-Demand and Spot instances * supporting multiple versions * support creating parameters subsets (used for reuse and inheritance)
#### ENI
Explain Elastic Network Interfaces (ENI)
[AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html): "An elastic network interface is a logical networking component in a VPC that represents a virtual network card."
Name at least three attributes the Elastic Network Interfaces (ENI) can include
1. One public IPv4 address 2. Mac Address 3. A primary private IPv4 address (from the address range of your VPC)
True or False? ENI are not bound to a specific availability zone
False. ENI are bound to specific availability zone.
True or False? ENI can be created independently of EC2 instances
True. They can be attached later on and on the fly (for failover purposes).
#### Placement Groups
What are "Placement Groups"?
[AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html): "When you launch a new EC2 instance, the EC2 service attempts to place the instance in such a way that all of your instances are spread out across underlying hardware to minimize correlated failures. You can use placement groups to influence the placement of a group of interdependent instances to meet the needs of your workload."
What Placement Groups strategies are there?
* Cluster: places instance close together in an AZ. * Spread: spreads the instance across the hardware * Partition: spreads the instances across different partitions (= different sets of hardware/racks) within an AZ
For each of the following scenarios choose a placement group strategy: * High availability is top priority * Low latency between instances * Instances must be isolated from each other * Big Data applications that are partition aware * Big Data process that needs to end quickly
* High availability is top priority - Spread * Low latency between instances - Cluster * Instances must be isolated from each other - Spread * Big Data applications that are partition aware - Partition * Big Data process that needs to end quickly - Cluster
What are the cons and pros of the "Cluster" placement group strategy?
Cons: if the hardware fails, all instances fail Pros: Low latency & high throughput network
What are the cons and pros of the "Spread" placement group strategy?
Cons: * Current limitation is 7 instances per AZ (per replacement group) Pros: * Maximized high availability (instances on different hardware, span across AZs)
### VPC
What is VPC?
"A logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define" Read more about it [here](https://aws.amazon.com/vpc).
True or False? VPC spans multiple regions
False
True or False? It's possible to have multiple VPCs in one region
True. As of today, the soft limit is 5.
True or False? Subnets belong to the same VPC, can be in different availability zones
True. Just to clarify, a single subnet resides entirely in one AZ.
You have noticed your VPC's subnets (which use x.x.x.x/20 CIDR) have 4096 available IP addresses although this CIDR should have 4096 addresses. What is the reason for that?
AWS reserves 5 IP addresses in each subnet - first 4 and the last one, and so they aren't available for use.
What AWS uses the 5 reserved IP addresses for?
x.x.x.0 - network address x.x.x.1 - VPC router x.x.x.2 - DNS mapping x.x.x.3 - future use x.x.x.255 - broadcast address
What is an Internet Gateway?
[AWS Docs](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html): "component that allows communication between instances in your VPC and the internet" In addition it's good to know that IGW is: * Highly available and redundant * Not porivding internet access by its own (you need route tables to be edited) * Created separately from VPC
True or False? One or more VPCs can be attached to one Internet Gateway
False. Only one VPC can be attached to one IGW and vice versa
True or False? NACL allow or deny traffic on the subnet level
True
What is VPC peering?
[docs.aws](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html): "A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses."
True or False? Multiple Internet Gateways can be attached to one VPC
False. Only one internet gateway can be attached to a single VPC.
You've restarted your EC2 instance and the public IP has changed. How would you deal with it so it won't happen?
Use Elastic IP which provides you a fixed IP address.
When creating a new VPC, there is an option called "Tenancy". What is it used for?
[AWS Docs](https://docs.aws.amazon.com/vpc/latest/userguide/create-vpc.html): `Tenancy` option defines if EC2 instances that you launch into the VPC will run on hardware that's shared with other AWS accounts or on hardware that's dedicated for your use only.
What is an Elastic IP address?
[AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html): "An Elastic IP address is a static IPv4 address designed for dynamic cloud computing. An Elastic IP address is allocated to your AWS account, and is yours until you release it. By using an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account."
Why would you use an Elastic IP address?
Let's say you have an instance that you need to shutdown or perform some maintenance on. In that case, what you would want to do is to move the Elastic IP address to another instance that is operational, until you finish to perform the maintenance and then you can move it back to the original instance (or keep it assigned to the second one).
True or False? When stopping and starting an EC2 instance, its public IP changes
True
What are the best practices around Elastic IP?
The best practice is actually not using them in the first place. It's more common to use a load balancer without a public IP or use a random public IP and register a DNS record to it
True or False? An Elastic IP is free, as long it's not associated with an EC2 instance
False. An Elastic IP is free of charge as long as **it is ** associated with an EC2 instance. This instance should be running and should have only one Elastic IP.
True or False? Route Tables used to allow or deny traffic from the internet to AWS instances
False.
Explain Security Groups and Network ACLs
* NACL - security layer on the subnet level. * Security Group - security layer on the instance level. Read more about it [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html) and [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html)
What is AWS Direct Connect?
Allows you to connect your corporate network to AWS network.
What would you use if you need a fixed public IP for your EC2 instance?
Elastic IP
Kratos, your colleague, decided to use a subnet of /27 because he needs 29 IP addresses for EC2 instances. Is Kratos right?
No. Since AWS reserves 5 IP addresses for every subnet, Kratos will have 32-5=27 addresses and this is less than what he needs (29). It's better if Kratos uses a subnet of size /26 but good luck telling him that.
#### Default VPC
True or False? By default, any new account has a default VPC
True.
True or False? Default VPC doesn't have internet connectivity and any launched EC2 will only have a private IP assigned
False. The default VPC has internet connectivity and any launched EC2 instance gets a public IPv4 address. In addition, any launched EC2 instance gets a public and private DNS names.
Which of the following is included with default VPC? * Internet gateway connected to the default VPC * A route to main route table that points all traffic to internet gateway * Default public subnet * Default /16 IPv4 CIDR block
All of them :)
### Lambda
Explain what is AWS Lambda
AWS definition: "AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume." Read more on it [here](https://aws.amazon.com/lambda)
True or False? In AWS Lambda, you are charged as long as a function exists, regardless of whether it's running or not
False. Charges are being made when the function is executed for the time it takes to execute and compute resources it uses.
Which of the following set of languages Lambda supports? - R, Swift, Rust, Kotlin - Python, Ruby, Go, Kotlin, Bash - Python, Ruby, PHP, PowerShell, C#, Perl - Python, Ruby, Go, Node.js, Groovy, C++ - Python, Ruby, Go, Node.js, PowerShell, C#
- Python, Ruby, Go, Node.js, PowerShell, C#
True or False? Basic lambda permissions allow you only to upload logs to Amazon CloudWatch Logs
True
What's one of the issues with the current architecture?
Users shouldn't access directly AWS Lambda directly. If you'd to like to expose your Lambda function to users a better approach would be to set up API Gateway endpoint between the users and the Lambda function. This not only provides enhanced security but also easier access for the user where he can use HTTP or HTTPS for accessing the function.
Specify one or more use cases for using AWS Lambda
- Uploading images to S3 and tagging them or inserting information on the images to a database - Uploading videos to S3 and edit them or add subtitles/captions to them and store the result in S3 - Use SNS and/or SQS to trigger functions based on notifications or messages received from these services. - Cron Jobs: Use Lambda together with CloudWatch events to schedule tasks/functions periodically.
You run an architecture where you have a Lambda function that uploads images to S3 bucket and stores information on the images in DynamoDB. You would like to expose the function to users so they can invoke it. Your friend Carlos suggests you expose the credentials to the Lambda function. What's your take on that?
That's a big no. You shouldn't let users direct access to your Lambda function. The way to go here and expose the Lambda function to users is to to an API Gateway endpoint.
### Containers #### ECS
What is Amazon ECS?
[AWS Docs](https://aws.amazon.com/ecs): "Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service. Customers such as Duolingo, Samsung, GE, and Cook Pad use ECS to run their most sensitive and mission critical applications because of its security, reliability, and scalability." In simpler words, it allows you to launch containers on AWS.
While AWS takes care of starting/stopping containers, you need to provision and maintain the infrastructure where the containers are running (EC2 instances).
What one should do in order to make EC2 instance part of an ECS cluster?
Install ECS agent on it. Some AMIs have built-in configuration for that.
What ECS launch types are there?
* EC2 Instance * AWS Fargate
What is Amazon ECR?
[AWS Docs](https://aws.amazon.com/ecr): "Amazon Elastic Container Registry (ECR) is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images."
What the role "EC2 Instance Profile" is used for in regards to ECS?
EC2 Instance Profile used by ECS agent on an EC2 instance to: * Make API calls to ECS Service * Send logs to CloudWatch from the container * Use secrets defined in SSM Parameter Store or Secrets Manager * Pull container images from ECR (Registry)
How to share data between containers (some from ECS and some from Fargate)?
Using EFS is a good way to share data between containers and it works also between different AZs.
#### Fargate
What is AWS Fargate?
[Amazon Docs](https://aws.amazon.com/fargate): "AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers. AWS Fargate is compatible with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS)" In simpler words, AWS Fargate allows you launch containers on AWS without worrying about managing infrastructure. It runs containers based on the CPU and RAM you need.
How AWS Fargate different from AWS ECS?
In AWS ECS, you manage the infrastructure - you need to provision and configure the EC2 instances.
While in AWS Fargate, you don't provision or manage the infrastructure, you simply focus on launching Docker containers. You can think of it as the serverless version of AWS ECS.
True or False? Fargate creates an ENI for every task it runs
True.
### S3 #### Basics
Explain what is AWS S3?
- S3 is a object storage service which is fast, scalable and durable. S3 enables customers to upload, download or store any file or object that is up to 5 TB in size.
- S3 stands for: Simple Storage Service - As a user you don't have to worry about filesystems or disk space
#### Buckets 101
What is a bucket?
An S3 bucket is a resource which is similar to folders in a file system and allows storing objects, which consist of data.
True or False? Buckets are defined globally
False. They are defined at the region level.
True or False? A bucket name must be globally unique
True
How to rename a bucket in S3?
A S3 bucket name is immutable. That means it's not possible to change it, without removing and creating a new bucket. This is why the process for renaming a bucket is as follows: * Create a new bucket with the desired name * Move the data from the old bucket to it * Delete the old bucket With the AWS CLI that would be: ```sh # Create new bucket aws s3 mb s3://[NEW_BUCKET_NAME] # Sync the content from the old bucket to the new bucket $ aws s3 sync s3://[OLD_BUCKET_NAME] s3://[NEW_BUCKET_NAME] # Remove old bucket $ aws s3 rb --force s3://[OLD_BUCKET_NAME] ```
True or False? The max object size a user can upload in one go, is 5TB
True
Explain "Multi-part upload"
[Amazon docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html): "Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data...In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation."
#### Objects
Explain "Object Versioning"
When enabled at a bucket level, versioning allows you to upload new version of files, overriding previous version and so be able to easily roll-back and protect your data from being permanently deleted.
Explain the following: - Object Lifecycles - Object Sharing
* Object Lifecycles - Transfer objects between storage classes based on defined rules of time periods * Object Sharing - Share objects via a URL link
Explain Object Durability and Object Availability
Object Durability: The percent over a one-year time period that a file will not be lost Object Availability: The percent over a one-year time period that a file will be accessible
#### S3 Security
True or False? Every new S3 bucket is public by default
False. A newly created bucket is private unless it was configured to be public.
What's a presigned URL?
Since every newly created bucket is by default private it doesn't allows to share files with users. Even if the person who uploaded them tries to view them, it gets denied. A presigned URL is a way to bypass that and allow sharing the files with users by including the credentials (token) as part of the URL. It can be done for limited time.
What security measures have you taken in context of S3?
* Don't make a bucket public. * Enable encryption if it's disabled. * Define an access policy
What encryption types supported by S3?
* SSE-S3 * SSE-KMS * SSE-C
Describe shortly how SSE-S3 (AES) encryption works
1. You upload a file to S3 using HTTP (or HTTPS) and header 2. S3 uses the managed data key to encrypt it 3. S3 stores the encrypted object in the bucket
True or False? In case of SSE-S3 (AES-256) encryption, you manage the key
False. S3 manages the key and uses AES-256 algorithm for the encryption.
Who or what manages the keys in the case of SSE-KMS encryption?
The KMS service.
Why would someone choose to use SSE-KMS instead of SSE-S3?
SS3-KMS provides control over who has access to the keys and you can also enabled audit trail.
True or False? In case of SSE-C encryption, both S3 and you manage the keys
False. You manage the keys. It's customer provided keys.
True or False? In case of SSE-C HTTPS must be used and encryption key must be provided in headers for every HTTP request
True.
Describe shortly how SSE-C encryption works
1. User uploads a file to S3 using HTTPS while providing data key in the header 2. AWS S3 performs the encryption using the provided data key and encrypted object is stored in the bucket If a user would like to get the object, the same data key would have to be provided.
With which string an header starts? * x-zmz * x-amz * x-ama
x-amz
#### Misc
What is a storage class? What storage classes are there?
Each object has a storage class assigned to, affecting its availability and durability. This also has effect on costs. Storage classes offered today: * Standard: * Used for general, all-purpose storage (mostly storage that needs to be accessed frequently) * The most expensive storage class * 11x9% durability * 2x9% availability * Default storage class * Standard-IA (Infrequent Access) * Long lived, infrequently accessed data but must be available the moment it's being accessed * 11x9% durability * 99.90% availability * One Zone-IA (Infrequent Access): * Long-lived, infrequently accessed, non-critical data * Less expensive than Standard and Standard-IA storage classes * 2x9% durability * 99.50% availability * Intelligent-Tiering: * Long-lived data with changing or unknown access patterns. Basically, In this class the data automatically moves to the class most suitable for you based on usage patterns * Price depends on the used class * 11x9% durability * 99.90% availability * Glacier: Archive data with retrieval time ranging from minutes to hours * Glacier Deep Archive: Archive data that rarely, if ever, needs to be accessed with retrieval times in hours * Both Glacier and Glacier Deep Archive are: * The most cheap storage classes * have 9x9% durability More on storage classes [here](https://aws.amazon.com/s3/storage-classes)
A customer would like to move data which is rarely accessed from standard storage class to the most cheapest class there is. Which storage class should be used? * One Zone-IA * Glacier Deep Archive * Intelligent-Tiering
Glacier Deep Archive
What Glacier retrieval options are available for the user?
Expedited, Standard and Bulk
True or False? Each AWS account can store up to 500 PetaByte of data. Any additional storage will cost double
False. Unlimited capacity.
Explain what is Storage Gateway
"AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage". More on Storage Gateway [here](https://aws.amazon.com/storagegateway)
Explain the following Storage Gateway deployments types * File Gateway * Volume Gateway * Tape Gateway
Explained in detail [here](https://aws.amazon.com/storagegateway/faqs)
What is the difference between stored volumes and cached volumes?
Stored Volumes - Data is located at customer's data center and periodically backed up to AWS Cached Volumes - Data is stored in AWS cloud and cached at customer's data center for quick access
What is "Amazon S3 Transfer Acceleration"?
AWS definition: "Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket" Learn more [here](https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html)
Explain data consistency
S3 Data Consistency provides strong read-after-write consistency for PUT and DELETE requests of objects in the S3 bucket in all AWS Regions. S3 always return latest file version.
Can you host dynamic websites on S3? What about static websites?
No. S3 support only statis hosts. On a static website, individual webpages include static content. They might also contain client-side scripts. By contrast, a dynamic website relies on server-side processing, including server-side scripts such as PHP, JSP, or ASP.NET. Amazon S3 does not support server-side scripting.
### Disaster Recovery
In regards to disaster recovery, what is RTO and RPO?
RTO - The maximum acceptable length of time that your application can be offline. RPO - The maximum acceptable length of time during which data might be lost from your application due to an incident.
What types of disaster recovery techniques AWS supports?
* The Cold Method - Periodically backups and sending the backups off-site
* Pilot Light - Data is mirrored to an environment which is always running * Warm Standby - Running scaled down version of production environment * Multi-site - Duplicated environment that is always running
Which disaster recovery option has the highest downtime and which has the lowest?
Lowest - Multi-site Highest - The cold method
### CloudFront
Explain what is CloudFront
AWS definition: "Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment." More on CloudFront [here](https://aws.amazon.com/cloudfront)
Explain the following * Origin * Edge location * Distribution
What delivery methods available for the user with CDN?
True or False?. Objects are cached for the life of TTL
True
What is AWS Snowball?
A transport solution which was designed for transferring large amounts of data (petabyte-scale) into and out the AWS cloud.
How can a company ensure their web application continues to operate if it becomes unavailable in its current single region?
Deploy the application in multiple Regions. Use Amazon Route 53 DNS health checks to route traffic to a healthy Region
### ELB
What is ELB (Elastic Load Balancing)?
[AWS Docs](https://aws.amazon.com/elasticloadbalancing): "Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions."
True or False? Elastic Load Balancer is a managed resource (= AWS takes care of it)
True. AWS responsible for making sure ELB is operational and takes care of lifecycle operations like upgrades, maintenance and high availability.
What types of AWS load balancers are there?
* Classic Load Balancer (CLB): Mainly for TCP (layer 4) and HTTP, HTTPS (layer 7) * Application Load Balancer (ALB): Mainly for HTTP, HTTPS and WebSocket * Network Load Balancer (NLB): Mainly for TCP, TLS and UDP * Gateway Load Balancer (GWLB): Mainly for layer 3 operations (IP protocol)
What's a "listener" in regards to ELB?
What's a "target group" in regards to ELB?
Which load balancer would you use for services which use HTTP or HTTPS traffic?
Application Load Balancer (ALB).
What are some use cases for using Gateway Load Balancer?
* Intrusion Detection * Firewall * Payload manipulation
Explain "health checks" in the context of AWS ELB
Health checks used by ELB to check whether EC2 instance(s) are properly working.
If health checks fail, ELB knows to not forward traffic to that specific EC2 instance where the health checks failed.
True or False? AWS ELB health checks are done on a port and a route
True. For example, port `2017` and endpoint `/health`.
What types of load balancers are supported in EC2 and what are they used for?
* Application LB - layer 7 traffic
* Network LB - ultra-high performances or static IP address (layer 4)
* Classic LB - low costs, good for test or dev environments (retired by August 15, 2022)
* Gateway LB - transparent network gateway and and distributes traffic such as firewalls, intrusion detection and prevention systems, and deep packet inspection systems. (layer 3)
Which type of AWS load balancer is used in the following drawing?

Application Load Balancer (routing based on different endpoints + HTTP is used).
What are possible target groups for ALB (Application Load Balancer)?
* EC2 tasks * ECS instances * Lambda functions * Private IP Addresses
True or False? ALB can route only to a single route group
False. ALB can route to multiple target groups.
If you wanted to analyze network traffic, you would use the `____ load balancer`
Gateway Load Balancer
Who has better latency? Application Load Balancer or Network Load Balancer?
Network Load Balancer (~100 ms) as ALB has a latency of ~400 ms
True or False? Network load balancer has one static IP per availability zone
True.
What are the supported target groups for network load balancer?
* EC2 instance * IP addresses * Application Load Balancer
What are the supported target groups for gateway load balancer?
* EC2 instance * IP addresses (must be private IPs)
Name one use case for using application load balancer as a target group for network load balancer
You might want to have a fixed IP address (NLB) and then forward HTTP traffic based on path, query, ... which is then done by ALB
What are some use cases for using Network Load Balancer?
* TCP, UDP traffic * Extreme performance
True or False? Network load balancers operate in layer 4
True. They forward TCP, UDP traffic.
True or False? It's possible to enable sticky session for network load balancer so the same client is always redirected to the same instance
False. This is only supported in Classic Load Balancer and Application Load Balancer.
Explain Cross Zone Load Balancing
With cross zone load balancing, traffic distributed evenly across all (registered) instances in all the availability zones.
True or False? For network load balancer, cross zone load balancing is always on and can't be disabled
False. It's disabled by default
True or False? In regards to cross zone load balancing, AWS charges you for inter AZ data in network load balancer but no in application load balancer
True. It charges for inter AZ data in network load balancer, but not in application load balancer
True or False? Both ALB and NLB support multiple listeners with multiple SSL certificates
True
Explain Deregistration Delay (or Connection Draining) in regards to ELB
The period of time or process of "draining" instances from requests/traffic (basically let it complete all active connections but don't start new ones) so it can be de-registered eventually and ELB won't send requests/traffic to it anymore.
#### NLB
At what network level/layer a Network Load Balancer operates?
Layer 4
#### ALB
True or False? With ALB (Application Load Balancer) it's possible to do routing based on query string and/or headers
True.
True or False? For application load balancer, cross zone load balancing is always on and can't be disabled
True
### Auto Scaling Group
Explain Auto Scaling Group
[Amazon Docs](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html): "An Auto Scaling group contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management. An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies"
You have two instance running as part of ASG. You change the desired capacity to 1. What will be the outcome of this change?
One of the instances will be terminated.
How can you customize the trigger for the scaling in/out of an auto scaling group?
One way is to use CloudWatch alarms where an alarm will monitor a metric and based on a certain value (or range) you can choose to scale-in or scale-out the ASG.
What are some metrics/rules used for auto scaling
* Network In/Out * Number of requests on ELB per instance * Average CPU, RAM usage
What is dynamic Scaling policy in regards to Auto Scaling Groups?
A policy in which scaling will occur automatically based on different metrics. There are 3 types: 1. Target Tracking Scaling: scale when the baseline changes (e.g. CPU is over 60%) 2. Step Scaling: more granular scaling where you can choose different actions for different metrics values (e.g. when CPU less than 20%, remove one instance. When CPU is over 40%, add 3 instances) 3. Scheduled Actions: set in advance scaling for specific period of time (e.g. add instances on Monday between 10:00 am to 11:00 am)
What is a predictive scaling policy in regards to Auto Scaling Groups?
Scale by analyzing historical load and schedule scaling based on forecast load.
Explain scaling cooldowns in regards to Auto Scaling Groups
During a scaling cooldown, ASG will not terminate or launch additional instances. The cooldown happens after scaling activity and the reason for this behaviour is that some metrics have to be collected and stabilize before another scaling operating can take place.
Explain the default ASG termination policy
1. It finds the AZ which the most number of EC2 instances 2. If number of instances > 1, choose the one with oldest launch configuration, template and terminate it
True or False? by default, ASG tries to balance the number of instances across AZ
True, this is why when it terminates instances, it chooses the AZ with the most instances.
Explain Lifecycle hooks in regards to Auto Scaling Groups
Lifecycle hooks allows you perform extra steps before the instance goes in service (During pending state) or before it terminates (during terminating state).
If you use ASG and you would like to run extra steps before the instance goes in service, what will you use?
Lifecycle hooks in pending state.
Describe one way to test ASG actually works
In Linux instances, you can install the 'stress' package and run stress to load the system for certain period of time and see if ASG kicks in by adding additional capacity (= more instances). For example: `sudo stress --cpu 100 --timeout 20`
### Security
What is the shared responsibility model? What AWS is responsible for and what the user is responsible for based on the shared responsibility model?
The shared responsibility model defines what the customer is responsible for and what AWS is responsible for. More on the shared responsibility model [here](https://aws.amazon.com/compliance/shared-responsibility-model)
True or False? Based on the shared responsibility model, Amazon is responsible for physical CPUs and security groups on instances
False. It is responsible for Hardware in its sites but not for security groups which created and managed by the users.
Explain "Shared Controls" in regards to the shared responsibility model
AWS definition: "apply to both the infrastructure layer and customer layers, but in completely separate contexts or perspectives. In a shared control, AWS provides the requirements for the infrastructure and the customer must provide their own control implementation within their use of AWS services" Learn more about it [here](https://aws.amazon.com/compliance/shared-responsibility-model)
What is the AWS compliance program?
How to secure instances in AWS?
* Instance IAM roles should have minimal permissions needed. You don't want an instance-level incident to become an account-level incident * Use "AWS System Manager Session Manager" for SSH * Using latest OS images with your instances
What is AWS Artifact?
AWS definition: "AWS Artifact is your go-to, central resource for compliance-related information that matters to you. It provides on-demand access to AWS’ security and compliance reports and select online agreements." Read more about it [here](https://aws.amazon.com/artifact)
What is AWS Inspector?
AWS definition: "Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices."" Learn more [here](https://aws.amazon.com/inspector)
What is AWS Guarduty?
AWS definition: "Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your Amazon Web Services accounts, workloads, and data stored in Amazon S3"
Monitor VPC Flow lows, DNS logs, CloudTrail S3 events and CloudTrail Mgmt events.
What is AWS Shield?
AWS definition: "AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS."
What is AWS WAF? Give an example of how it can used and describe what resources or services you can use it with
What AWS VPN is used for?
What is the difference between Site-to-Site VPN and Client VPN?
What is AWS CloudHSM?
Amazon definition: "AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud." Learn more [here](https://aws.amazon.com/cloudhsm)
True or False? AWS Inspector can perform both network and host assessments
True
What is AWS Key Management Service (KMS)?
AWS definition: "KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications." More on KMS [here](https://aws.amazon.com/kms)
What is AWS Acceptable Use Policy?
It describes prohibited uses of the web services offered by AWS. More on AWS Acceptable Use Policy [here](https://aws.amazon.com/aup)
True or False? A user is not allowed to perform penetration testing on any of the AWS services
False. On some services, like EC2, CloudFront and RDS, penetration testing is allowed.
True or False? DDoS attack is an example of allowed penetration testing activity
False.
True or False? AWS Access Key is a type of MFA device used for AWS resources protection
False. Security key is an example of an MFA device.
What is Amazon Cognito?
Amazon definition: "Amazon Cognito handles user authentication and authorization for your web and mobile apps." Learn more [here](https://docs.aws.amazon.com/cognito/index.html)
What is AWS ACM?
Amazon definition: "AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources." Learn more [here](https://aws.amazon.com/certificate-manager)
### Databases #### RDS
What is AWS RDS?
* Relational Database Service * Managed DB service (you can't ssh the machine) * Supports multiple DBs: MySQL, Oracle, Aurora (AWS Proprietary), ...
Why to use AWS RDS instead of launching an EC2 instance and install a database on it?
AWS RDS is a managed service, that means it's automatically provisioned and patched for you. In addition, it provides you with continuous backup (and the ability to restore from any point of time), scaling capability (both horizontal and vertical), monitoring dashboard and read replicas.
What do you know about RDS backups?
* Automated backups * Full daily backup (done during maintenance window) * Transactions logs backup every 5 minutes * Retention can be increased and by default it's 7 days
Explain AWS RDS Storage Auto Scaling
* RDS storage can automatically be increased upon lack in storage * The user needs to set "Maximum Storage Threshold" to have some limit on storage scaling * Use cases: applications with unpredictable workloads * Supports multiple RDS database engines
Explain Amazon RDS Read Replicas
[AWS Docs](https://aws.amazon.com/rds/features/read-replicas): "Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances. They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads." In simpler words, it allows you to scale your reads.
True or False? RDS read replicas are supported within az, cross az and cross region
True
True or False? RDS read replicas are asynchronous
True. This is done so the reads are consistent.
True or False? Amazon RDS supports MongoDB
False. RDS is relational database and MongoDB is a NoSQL db.
What are some use cases for using RDS read replicas?
You have a main application which works against your database but you would like to add additional app, one used for logging, analytics, ... so you prefer it won't use the same database. In this case, you create a read replica instance and the second application works against that instance.
Explain RDS Multi Availability Zone
* RDS multi AZ used mainly for disaster recovery purposes * There is an RDS master instance and in another AZ an RDS standby instance * The data is synced synchronously between them * The user, application is accessing one DNS name and where there is a failure with the master instance, the DNS name moves to the standby instance, so the failover done automatically
True or False? Moving AWS RDS from single AZ to multi AZ is an operation with downtime (meaning there is a need to stop the DB)
False. It's a zero downtime operation = no need to stop the database.
How AWS RDS switches from single AZ to multi AZ?
1. Snapshot is taken by RDS 2. The snapshot is restored to another, standby, RDS instance 3. Synchronization is enabled between the two instances
True or False? RDS encryption should be defined at launch time
True
True or False? in regards to RDS, replicas can be encrypted even if the master isn't encrypted
False
How to make RDS snapshots encrypted?
* If RDS database is encrypted then, the snapshot itself is also encrypted * If RDS database isn't encrypted then, the snapshot itself isn't encrypted and then you can copy the un-encrypted snapshot to created an encrypted copy
How to encrypt an un-encrypted RDS instance?
Create a copy of the un-encrypted instance -> copy the snapshot to create an encrypted copy -> restore the database from the encrypted snapshot -> migrate the application to work against the copied instance -> remove the original DB instance
How IAM authentication works with RDS?
For example: 1. EC2 instance uses IAM role to make an API call to get auth token 2. The token, with SSL encryption, is used for accessing the RDS instance Note: The token has a lifetime of 15 minutes
True or False? In case of RDS (not Aurora), read replicas require you to change the SQL connection string
True. Since read replicas add endpoints, each with its own DNS name, you need to modify your app to reference these new endpoints to balance the load read.
#### Aurora
What do you know about Amazon Aurora?
* A MySQL & Postgresql based relational database. * Proprietary technology from AWS * The default database proposed for the user when using RDS for creating a database. * Storage automatically grows in increments of 10 GiB * HA native - failover in instant * Has better performances over MySQL and Postgres * Supports 15 replicas (while MySQL supports 5)
True or False? Aurora stores 4 copies of your data across 2 availability zones
False. It stores 6 copies across 3 availability zones
True or False? Aurora support self healing where corrupted data replaced by doing peer-to-peer replication
True
True or False? Aurora storage is striped across 20 volumes
False. 100 volumes.
True or False? It's possible to scale Aurora replicas
True. If your read replica instances exhaust their CPU, you can scale by adding more instances
Explain Aurora Serverless. What use cases is it good for?
* Aurora serverless is an automated database instantiation and it's auto scaled based on an actual usage * It's good mainly for infrequent or unpredictable workflows * You pay per second so it can eventually be more cost effective
What is the use case for Aurora multi-master?
Aurora multi-master is perfect for a use case where you want to have instant failover for write node.
#### DynamoDB
What is AWS DynamoDB?
Explain "Point-in-Time Recovery" feature in DynamoDB
Amazon definition: "You can create on-demand backups of your Amazon DynamoDB tables, or you can enable continuous backups using point-in-time recovery. For more information about on-demand backups, see On-Demand Backup and Restore for DynamoDB." Learn more [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html)
Explain "Global Tables" in DynamoDB
Amazon definition: "A global table is a collection of one or more replica tables, all owned by a single AWS account." Learn more [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_HowItWorks.html)
What is DynamoDB Accelerator?
Amazon definition: "Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds..." Learn more [here](https://aws.amazon.com/dynamodb/dax)
#### ElastiCache
What is AWS ElastiCache? In what use case should it be used?
Amazon Elasticache is a fully managed Redis or Memcached in-memory data store.
It's great for read-intensive workloads where the common data/queries are cached and apps/users access the cache instead of the primary database.
Describe the workflow of an application using the cache in AWS
1. The application performs a query against the DB. There is a check to see if the data is in the cache 1. If it is, it's a "cache hit" and the data is retrieved from there 2. If it's not in there, it's a "cache miss" and the data is pulled from the database 1. The data is then also written to the cache (assuming it is often accessed) and next time the user queries for the same data, it might be retrieved from the cache (depends on how much time passed and whether this specific data was invalidated or not)
How can you make an application stateless using ElastiCache?
Let's say you have multiple instances running the same application and every time you use the application, it creates a user session.
This user session can be stored in ElastiCache so even if the user contacts a different instance of the application, the application can retrieve the session from the ElsatiCache.
You need a highly available cache with backup and restore features. Which one would you use?
ElastiCache Redis.
You need a cache with read replicas that can be scaled and one support multi AZ. Which one would you use?
ElastiCache Redis.
You need a cache that supports sharding and built with multi-threaded architecture in mind. Which one would you use?
ElastiCache Memcached
True or False? ElastiCache doesn't supports IAM authentication
True.
What patterns are there for loading data into the cache?
* Write Through: add or update data in the cache when the data is written to the DB * Lazy Loading: all the read data is cached * Session Store: store temporary session data in cache
#### RedShift
What is AWS Redshift and how is it different than RDS?
cloud data warehouse
What do you if you suspect AWS Redshift performs slowly?
* You can confirm your suspicion by going to AWS Redshift console and see running queries graph. This should tell you if there are any long-running queries. * If confirmed, you can query for running queries and cancel the irrelevant queries * Check for connection leaks (query for running connections and include their IP) * Check for table locks and kill irrelevant locking sessions
What is Amazon DocumentDB?
Amazon definition: "Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data." Learn more [here](https://aws.amazon.com/documentdb)
What "AWS Database Migration Service" is used for?
What type of storage is used by Amazon RDS?
EBS
### Identify the Service
What would you use for automating code/software deployments?
AWS CodeDeploy
You would like to invoke a function every time you enter a URL in the browser. Which service would you use for that?
AWS Lambda
What would you use for easily creating similar AWS environments/resources for different customers?
CloudFormation
Using which service, can you add user sign-up, sign-in and access control to mobile and web apps?
Cognito
Which service would you use for building a website or web application?
Lightsail
Which tool would you use for choosing between Reserved instances or On-Demand instances?
Cost Explorer
What would you use to check how many unassociated Elastic IP address you have?
Trusted Advisor
Which service allows you to transfer large amounts (Petabytes) of data in and out of the AWS cloud?
AWS Snowball
Which service would you use if you need a data warehouse?
AWS RedShift
Which service provides a virtual network dedicated to your AWS account?
VPC
What you would use for having automated backups for an application that has MySQL database layer?
Amazon Aurora
What would you use to migrate on-premise database to AWS?
AWS Database Migration Service (DMS)
What would you use to check why certain EC2 instances were terminated?
AWS CloudTrail
What would you use for SQL database?
AWS RDS
What would you use for NoSQL database?
AWS DynamoDB
What would you use for adding image and video analysis to your application?
AWS Rekognition
Which service would you use for debugging and improving performances issues with your applications?
AWS X-Ray
Which service is used for sending notifications?
SNS
What would you use for running SQL queries interactively on S3?
AWS Athena
What would you use for preparing and combining data for analytics or ML?
AWS Glue
Which service would you use for monitoring malicious activity and unauthorized behavior in regards to AWS accounts and workloads?
Amazon GuardDuty
Which service would you use for centrally manage billing, control access, compliance, and security across multiple AWS accounts?
AWS Organizations
Which service would you use for web application protection?
AWS WAF
You would like to monitor some of your resources in the different services. Which service would you use for that?
CloudWatch
Which service would you use for performing security assessment?
AWS Inspector
Which service would you use for creating DNS record?
Route 53
What would you use if you need a fully managed document database?
Amazon DocumentDB
Which service would you use to add access control (or sign-up, sign-in forms) to your web/mobile apps?
AWS Cognito
Which service is often referred to as "used for decoupling applications"?
AWS SQS. Since it's a messaging queue so it allows applications to switch from synchronous communication to asynchronous one.
Which service would you use if you need messaging queue?
Simple Queue Service (SQS)
Which service would you use if you need managed DDOS protection?
AWS Shield
Which service would you use if you need store frequently used data for low latency access?
ElastiCache
What would you use to transfer files over long distances between a client and an S3 bucket?
Amazon S3 Transfer Acceleration
Which services are involved in getting a custom string (based on the input) when inserting a URL in the browser?
Lambda - to define a function that gets an input and returns a certain string
API Gateway - to define the URL trigger (= when you insert the URL, the function is invoked).
Which service would you use for data or events streaming?
Kinesis
Which (free) tool would you use to get information on cost savings?
Trusted Advisor
You would like to have on-perm storage access to AWS storage. What would you use for that?
Storage Gateway
### DNS (Route 53)
What is Route 53?
[AWS Route 53](https://aws.amazon.com/route53): "Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service..." Some of Route 53 features: * Register domains * DNS service - domain name translations * Health checks - verify your app is available * Not a feature but its SLA is 100% availability
What it means that "Route 53 is an Authoritative DNS"?
The customer can update DNS records
What each Route 53 record contains?
* Domain/subdomain name (e.g. blipblop.com) * Value (e.g. 201.7.202.2) * Record type (e.g. A, AAAA, MX) * TTL: amount of time the record is going to be cached * Routing Policy: how to respond to queries
What DNS record types does Route 53 supports?
* A * AAAA * CNAME * NS * DS * CAA * SOA * MX * TXT * SPF * SRV * NAPTR * PTR
What are hosted zones?
A container that includes records for defining how to route traffic from a domain and its subdomains
What types of hosted zones are there?
* Public Hosted Zones - include records to specify how to route traffic on the internet * Private Hosted Zones - contain records that specify how you traffic within VPC(s)
What is the difference between CNAME record and an Alias record?
CNAME is used for mapping one hostname to any other hostname while Alias is used to map an hostname to an AWS resource. In addition, Alias work for both root domain (somedomain.com) and non-root domain, while CNAME works only with non-root domain (foo.somedomain.com)
True or False? Alias record can be set up for an EC2 DNS name
False
True or False? Alias record can be set up for an VPC interface endpoint
True
True or False? Alias record is only of type A or AAAA
True
What is a routing policy in regards to AWS Route 53?
A routing policy routing defines how Route 53 responds to DNS queries.
What Route 53 routing policies are there?
* Simple * Geolocation * Failover * Latency based * Geoproximity * Multi-Value Answer * Weighted
Suppose you need to route % of your traffic to a certain instance and the rest of the traffic, to another instance. Which routing policy would you choose?
Weighted routing policy.
Suppose you need to route traffic to a single source with Route 53, without any other requirements, which routing policy would you choose?
The `simple` routing policy
Explain the geolocation routing policy
* Routing based on user location * Location can be specified by continent, country or US state * It's recommended to have a default record in case there is no match on location
What are some use cases for using geolocation routing policy?
* Restrict content distribution * App localization * Load balancing
Explain the geoproximity routing policy
* Route based on the geographic location of resources * Shifting routing is done based on the `bias` value * Resources can be of AWS and non-AWS type * For non-AWS you have to specify latitude and longitude in addition to AWS region as done in AWS-based resources * To use it, you have to use Route 53 traffic flow
What are some use cases for weighted routing policy?
* Load balancing between regions * Testing new applications versions
True or False? Route 53 simple routing policy supports both single and multiple values
True. If multiple values are returned from Route 53 then, the client chooses a single value to use.
True or False? In weighted routing DNS records must have the same name but not the same type
False. They must have the same name AND type.
You would like to use a routing policy that will take latency into account and will route to the resource with the lowest latency. Which routing policy would you use?
Latency-based routing policy.
What happens when you set all records to weight 0 when using Weighted routing policy?
All records are used equally.
What Route 53 health checks are used for?
Automated DNS failover based on monitoring: * Another health check * endpoint (app, AWS resource, server) * CloudWatch alarms
You would like to use a routing policy based on the resource location and be able to shift more traffic to some resources. Which one would you use?
Geoproximity routing policy
Explain Route 53 Traffic Flow feature
It's a visual editor for managing complex routing decision trees. It allows you to simplify the process of managing records. Configuration can be saved (as Traffic Flow Policy) and applied to different domains/hosted zones. In addition, it supports versioning
What are calculated health checks?
When you combine the results of multiple health checks into a single health check.
What is one possible use case for using calculated health checks?
Performing maintenance for a website without causing all the health checks to fail.
You would like to use a routing policy based on the user location. Which one would you use?
Geolocation routing policy. It's based on user location. Don't confuse it with latency-based routing policy. While shorter distance may result in lower latency, this is not the requirement in the question.
True or False? Route 53 Multi Value is a substitute for those who want cheaper solution than ELB
False. Route 53 Multi Value is not a substitute for ELB. It's focused on client-side load balancing as opposed to ELB.
True or False? Domain registrar and DNS service is inherently the same thing
False. DNS service can be Route 53 (where you manage DNS records) while the domain itself can be purchased from other sources that aren't Amazon related (e.g. GoDadday).
### SQS
What is Simple Queue Service (SQS)?
AWS definition: "Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications". Learn more about it [here](https://aws.amazon.com/sqs)
Explain "producer" and "consumer" in regards to messaging queue
Producer is the application or in general, the source that sends messages to the queue. Consumer is the process or application that pulls the messages from the queue.
What "default retention of messages" means?
It refers to a retention period in which a message has to consumed/processed and deleted from the queue. As of today, the retention of a message is 4 days by default and the maximum allows is 14 days.
What's the limitation on message size in SQS? * 128KB * 128MB * 256KB * 256MB
256KB
True or False? It's possible to have duplicated messages in the queue
True. It's referred to as "at least once delivery".
True or False? "Consumers" can be only EC2 instances
False. They can be Lambda functions and even on-premise instances
True or False? Processes/Applications use from the SDK the SendMessage API in order to send messages to the queue
True.
What it means "best effort ordering" in regards to SQS?
It means messages in the queue can be out of order.
What is "Delay Queue" in regards to SQS?
It's the time in seconds to delay the delivery of new messages (when they reached the queue already). The limit as of today is 15 minutes.
What is "Visibility Timeout?"
The time in seconds for a message to not be visible for consumers. The limit as of today is 12 hours
Give an example of architecture or workflow that involves SQS and EC2 & S3
A website that allows users to upload videos and adds subtitles to them: 1. First the user uploads the video through the web interface which uploads it to an S3 bucket 2. SQS gets notified with a message on the video location 3. EC2 instance (or Lambda function) starts to work on adding the subtitles 4. The video with the subtitles is uploaded to an S3 buckets 5. SQS gets notified of the result and specifically the video location
What's MessageGroupID?
### SNS
What is Simply Notification Service?
AWS definition: "a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications." Read more about it [here](https://aws.amazon.com/sns)
Explain the following in regards to SNS: - Topics - Subscribers - Publishers
* Topics - used for grouping multiple endpoints * Subscribers - the endpoints where topics send messages to * Publishers - the provider of the message (event, person, ...)
How SNS is different from SQS?
SNS, as opposed to SQS, works in a publisher/subscriber model. Where's SQS works in Producer/Consumer model. SQS delivers the message to one consumer where's SNS will send a message to multiple subscribers.
What's a Fan-Out pattern?
A messaging pattern where a single message is send to multiple destinations (often simultaneously). So one-to-many broadcast message.
### Monitoring and Logging
What is AWS CloudWatch?
AWS definition: "Amazon CloudWatch is a monitoring and observability service..." More on CloudWatch [here](https://aws.amazon.com/cloudwatch)
What is AWS CloudTrail?
AWS definition: "AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account." Read more on CloudTrail [here](https://aws.amazon.com/cloudtrail)
### Billing and Support
What are Service Control Policies and to what service they belong?
AWS organizations service and the definition by Amazon: "SCPs offer central control over the maximum available permissions for all accounts in your organization, allowing you to ensure your accounts stay within your organization’s access control guidelines." Learn more [here](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scp.html)
Explain AWS pricing model
It mainly works on "pay-as-you-go" meaning you pay only for what are using and when you are using it. In s3 you pay for 1. How much data you are storing 2. Making requests (PUT, POST, ...) In EC2 it's based on the purchasing option (on-demand, spot, ...), instance type, AMI type and the region used. More on AWS pricing model [here](https://aws.amazon.com/pricing)
How do you estimate AWS costs?
* TCO calculator * AWS simple calculator * Cost Explorer * AWS Budgets * Cost Allocation Tags
What basic support in AWS includes?
* 24x7 customer service * Trusted Advisor * AWS personal Health Dashoard
How are EC2 instances billed?
What AWS Pricing Calculator is used for?
What is Amazon Connect?
Amazon definition: "Amazon Connect is an easy to use omnichannel cloud contact center that helps companies provide superior customer service at a lower cost." Learn more [here](https://aws.amazon.com/connect)
What are "APN Consulting Partners"?
Amazon definition: "APN Consulting Partners are professional services firms that help customers of all types and sizes design, architect, build, migrate, and manage their workloads and applications on AWS, accelerating their journey to the cloud." Learn more [here](https://aws.amazon.com/partners/consulting)
Which of the following are AWS accounts types (and are sorted by order)? - Basic, Developer, Business, Enterprise - Newbie, Intermediate, Pro, Enterprise - Developer, Basic, Business, Enterprise - Beginner, Pro, Intermediate Enterprise
- Basic, Developer, Business, Enterprise
True or False? Region is a factor when it comes to EC2 costs/pricing
True. You pay differently based on the chosen region.
What is "AWS Infrastructure Event Management"?
AWS Definition: "AWS Infrastructure Event Management is a structured program available to Enterprise Support customers (and Business Support customers for an additional fee) that helps you plan for large-scale events such as product or application launches, infrastructure migrations, and marketing events."
#### AWS Organizations
What is "AWS Organizations"?
AWS definition: "AWS Organizations helps you centrally govern your environment as you grow and scale your workloads on AWS." Read more on Organizations [here](https://aws.amazon.com/organizations)
What's an OU in regards to AWS Organizations?'
OU (Organizational Units) is a way to group multiple accounts together so you can treat them as a single unit. By default there is the "Root" OU created in AWS Organizations. Most of the time OUs are based on functions or common set of controls.
### Automation
What is AWS CodeDeploy?
Amazon definition: "AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers." Learn more [here](https://aws.amazon.com/codedeploy)
Explain what is CloudFormation
AWS definition: "AWS CloudFormation is a service that helps you model and set up your Amazon Web Services resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS. You create a template that describes all the AWS resources that you want (like Amazon EC2 instances or Amazon RDS DB instances), and CloudFormation takes care of provisioning and configuring those resources for you."
What is AWS CDK?
AWS definition: "The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define cloud infrastructure as code and provision it through AWS CloudFormation. CDK gives the flexibility to use popular programming languages like TypeScript, JavaScript, Python, Java, C# and Go (in Developer Preview) to define your infrastructure, and AWS CDK provides a set of libraries for AWS services that abstract away the need to write raw CloudFormation templates. Learn more [here](https://aws.amazon.com/cdk)
### Misc
Which AWS service you have experience with that you think is not very common?
What is AWS CloudSearch?
What is AWS Lightsail?
AWS definition: "Lightsail is an easy-to-use cloud platform that offers you everything needed to build an application or website, plus a cost-effective, monthly plan."
What is AWS Rekognition?
AWS definition: "Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use." Learn more [here](https://aws.amazon.com/rekognition)
What AWS Resource Groups used for?
Amazon definition: "You can use resource groups to organize your AWS resources. Resource groups make it easier to manage and automate tasks on large numbers of resources at one time. " Learn more [here](https://docs.aws.amazon.com/ARG/latest/userguide/welcome.html)
What is AWS Global Accelerator?
Amazon definition: "AWS Global Accelerator is a service that improves the availability and performance of your applications with local or global users..." Learn more [here](https://aws.amazon.com/global-accelerator)
What is AWS Config?
Amazon definition: "AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources." Learn more [here](https://aws.amazon.com/config)
What is AWS X-Ray?
AWS definition: "AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture." Learn more [here](https://aws.amazon.com/xray)
What is AWS OpsWorks?
Amazon definition: "AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet." Learn more about it [here](https://aws.amazon.com/opsworks)
What is AWS Snowmobile?
"AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS." Learn more [here](https://aws.amazon.com/snowmobile)
What is AWS Athena?
"Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL." Learn more about AWS Athena [here](https://aws.amazon.com/athena)
What is Amazon Cloud Directory?
Amazon definition: "Amazon Cloud Directory is a highly available multi-tenant directory-based store in AWS. These directories scale automatically to hundreds of millions of objects as needed for applications." Learn more [here](https://docs.aws.amazon.com/clouddirectory/latest/developerguide/what_is_cloud_directory.html)
What is AWS Elastic Beanstalk?
AWS definition: "AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services...You can simply upload your code and Elastic Beanstalk automatically handles the deployment" Learn more about it [here](https://aws.amazon.com/elasticbeanstalk)
What is AWS SWF?
Amazon definition: "Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud." Learn more on Amazon Simple Workflow Service [here](https://aws.amazon.com/swf)
What is AWS EMR?
AWS definition: "big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto." Learn more [here](https://aws.amazon.com/emr)
What is AWS Quick Starts?
AWS definition: "Quick Starts are built by AWS solutions architects and partners to help you deploy popular technologies on AWS, based on AWS best practices for security and high availability." Read more [here](https://aws.amazon.com/quickstart)
What is the Trusted Advisor?
Amazon definition: "AWS Trusted Advisor provides recommendations that help you follow AWS best practices. Trusted Advisor evaluates your account by using checks. These checks identify ways to optimize your AWS infrastructure, improve security and performance, reduce costs, and monitor service quotas." Learn more [here](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/)
What is AWS Service Catalog?
Amazon definition: "AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS." Learn more [here](https://aws.amazon.com/servicecatalog)
What is AWS CAF?
Amazon definition: "AWS Professional Services created the AWS Cloud Adoption Framework (AWS CAF) to help organizations design and travel an accelerated path to successful cloud adoption. " Learn more [here](https://aws.amazon.com/professional-services/CAF)
What is AWS Cloud9?
AWS: "AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser"
What is AWS CloudShell?
AWS: "AWS CloudShell is a browser-based shell that makes it easy to securely manage, explore, and interact with your AWS resources."
What is AWS Application Discovery Service?
Amazon definition: "AWS Application Discovery Service helps enterprise customers plan migration projects by gathering information about their on-premises data centers." Learn more [here](https://aws.amazon.com/application-discovery)
What is the AWS well-architected framework and what pillars it's based on?
AWS definition: "The Well-Architected Framework has been developed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on five pillars — operational excellence, security, reliability, performance efficiency, and cost optimization" Learn more [here](https://aws.amazon.com/architecture/well-architected)
What AWS services are serverless (or have the option to be serverless)?
AWS Lambda AWS Athena
### High Availability
What high availability means from AWS perspective?
* Application/Service is running in at least 2 availability zones * Application/Service should survive (= operate as usual) a data center disaster
### Production Operations and Migrations
Describe in high-level how to upgrade a system on AWS with (near) zero downtime
One way is through launching a new instance. In more detail: 1. Launch a new instance 2. Install all the updates and applications 3. Test the instance 4. If all tests passed successfully, you can start using the new instance and perform the switch with the old one, in one of various ways: 1. Go to route53 and update the record with the IP of the new instance 2. If you are using an Elastic IP then move it to the new instance ...
You try to use an detached EBS volume from us-east-1b in us-east-1a, but it fails. What might be the reason?
EBS volumes are locked to a specific availability zone. To use them in another availability zone, you need to take a snapshot and restore it in the destination availability zone.
When you launch EC2 instances, it takes them time to boot due to commands you run with user data. How to improve instances boot time?
Consider creating customized AMI with the commands from user data already executed there. This will allow you launch instance instantly.
You try to mount EFS on your EC2 instance and it doesn't work (hangs...) What might be a possible reason?
Security group isn't attached to your EFS or it lacks a rule to allow NFS traffic.
How to migrate an EBS volume across availability zones?
1. Pause the application 2. Take a snapshot of the EBS volume 3. Restore the snapshot in another availability zone
How to encrypt an unencrypted EBS volume attached to an EC2 instance?
1. Create EBS snapshot of the volume 2. Copy the snapshot and mark the "Encrypt" option 3. Create a new EBS volume out of the encrypted snapshot
You've created a network load balancer but it doesn't work (you can't reach your app on your EC2 instance). What might be a possible reason?
Missing security group or misconfigured one. For example, if you go to your instances in the AWS console you might see that the instances under your NLB are in "unhealthy status" and if you didn't create a dedicated security group for your NLB, that means that the security group used is the one attached to the EC2 instances. Go to the security group of your instance(s) and enable the traffic that NLB should forward (e.g. TCP on port 80).
### Scenarios
You have a load balancer running and behind it 5 web servers. Users complain that every time they move to a new page, they have to authenticate, instead of doing it once. How can you solve it?
Enable sticky sessions. This way, the user keep working against the same instance, instead of being redirected to a different instance every request.
You have a load balancer running and behind it 5 web servers. Users complain that some times when they try to use the application it doesn't works. You've found out that sometimes some of the instances crash. How would you deal with it?
One possible way is to use health checks with the load balancer to ensure the instances are ready to be used before forwarding traffic to them.
You run your application on 5 EC2 instances on one AZ and on 10 EC2 instances in another AZ. You distribute traffic between all of them using a network load balancer, but it seems that instances in one AZ have higher CPU rates than the instances in the other AZ. What might be the issue and how to solve it?
It's possible that traffic is distributed evenly between the AZs but that doesn't mean it's distributed equally across all instances evenly. To distribute it evenly between all the instances, you have to enable cross-zone load balancing.
You are running an ALB that routes traffic using two hostnames: a.b.com and d.e.com. Is it possible to configure HTTPS for both of the hostnames?
Yes, using SNI (Server Name Indication) each application can has its own SSL certificate (This is supported from 2017).
You have set up read replicas to scale reads but users complain that when they update posts in forums, the posts are not being updated. What may cause this issue?
Read Replicas use asynchronous replication so it's possible users access a read replica instance that wasn't synced yet.
You need a persistent shared storage between your containers that some are running in Fargate and some in ECS. What would you use?
EFS. It allows us to have persistent multi-AZ shared storage for containers.
You would like to run an AWS Fargate task every time a file is uploaded to a certain S3 bucket. How would you achieve that?
Use Amazon EventBridge so every time a file is uploaded to an S3 bucket (event) it will run an ECS task. Such task should have an ECS Task Role so it can get the object from the S3 bucket (and possibly other permissions if it needs to update the DB for example).
Your hosts scale down and then back up quite often. What's your take on that?
Often circular scaling (scale down, up and vice versa) is not a sign that the threshold set for scaling down and up are met quite often. In most cases that's a sign for you to adjust the threshold so scaling down doesn't happen as often.
### Architecture Design
You've been asked to design an architecture for high performance and low-latency application (millions of requests per second). Which load balancer would you use?
Network Load Balancer
What should you use for scaling reads?
You can use an ElastiCache cluster or RDS Read Replicas.
You have two applications who communicate synchronously. It worked fine until there suddenly a spike of traffic. What change you might apply in this case?
More details are missing to determine for sure but it might be better to decouple the applications by introducing one of the following: * Queue model with SQS * Publisher/Subscriber model with SNS
### Misc
What's an ARN?
ARN (Amazon Resources Names) are used for uniquely identifying different AWS resources. It is used when you would like to identify resource uniqely across all AWS infrastructures.
================================================ FILE: topics/aws/exercises/access_advisor/exercise.md ================================================ ## AWS IAM - Access Advisor ### Objectives Go to the Access Advisor and answer the following questions regarding one of the users: 1. Are there services this user never accessed? 2. What was the last service the user has accessed? 3. What the Access Advisor is used/good for? ## Solution Click [here to view to solution](solution.md) ================================================ FILE: topics/aws/exercises/access_advisor/solution.md ================================================ ## AWS IAM - Access Advisor ### Objectives Go to the Access Advisor and answer the following questions regarding one of the users: 1. Are there services this user never accessed? 2. What was the last service the user has accessed? 3. What the Access Advisor is used/good for? ### Solution 1. Go to AWS IAM service and click on "Users" under "Access Management" 2. Click on one of the users 3. Click on the "Access Advisor" tab 4. Check which service was last accessed and which was never accessed Access Advisor can be good to evaluate whether there are services the user is not accessing (as in never or not frequently). This can be help in deciding whether some permissions should be revoked or modified. ================================================ FILE: topics/aws/exercises/alb_multiple_target_groups/exercise.md ================================================ ## AWS ELB - ALB Multiple Target Groups ### Requirements Two EC2 instances with a simple web application that shows the web page with the string "Hey, it's a me, ``!" One EC2 instance with a simple web application that shows the web page with the string "Hey, it's only a test..." under the endpoint /test ### Objectives 1. Create an application load balancer for the two instances you have, with the following properties 1. healthy threshold: 3 2. unhealthy threshold: 3 3. interval: 10 seconds 2. Create another target group for the third instance 1. Traffic should be forwarded to this group based on the "/test" path ================================================ FILE: topics/aws/exercises/alb_multiple_target_groups/solution.md ================================================ ## AWS ELB - ALB Multiple Target Groups ### Requirements Two EC2 instances with a simple web application that shows the web page with the string "Hey, it's a me, ``!" One EC2 instance with a simple web application that shows the web page with the string "Hey, it's only a test..." under the endpoint /test ### Objectives 1. Create an application load balancer for the two instances you have, with the following properties 1. healthy threshold: 3 2. unhealthy threshold: 3 3. interval: 10 seconds 2. Create another target group for the third instance 1. Traffic should be forwarded to this group based on the "/test" path ### Solution #### Console 1. Go to EC2 service 2. Click in the left side menu on "Load balancers" under "Load balancing" 3. Click on "Create load balancer" 4. Choose "Application Load Balancer" 5. Insert a name for the LB 6. Choose an AZ where you want the LB to operate 7. Choose a security group 8. Under "Listeners and routing" click on "Create target group" and choose "Instances" 1. Provide a name for the target group 2. Set healthy threshold to 3 3. Set unhealthy threshold to 3 4. Set interval to 10 seconds 5. Click on "Next" and choose two out of three instances you've created 6. Click on "Create target group" 9. Refresh target groups and choose the one you've just created 10. Click on "Create load balancer" and wait for it to be provisioned 11. In the left side menu click on "Target Groups" under "Load Balancing" 12. Click on "Create target group" 13. Set it with the same properties as previous target group but this time, add the third instance that you didn't include in the previous target group 14. Go back to your ALB and under "Listeners" click on "Edit rules" under your current listener 1. Add a rule where if the path is "/test" then traffic should be forwarded to the second target group you've created 2. Click on "Save" 15. Test it by going to the browser, insert the address and add "/test" to the address ================================================ FILE: topics/aws/exercises/app_load_balancer/exercise.md ================================================ ## AWS ELB - Application Load Balancer ### Requirements Two EC2 instances with a simple web application that shows the web page with the string "Hey, it's a me, ``!" ### Objectives 1. Create an application load balancer for the two instances you have, with the following properties 1. healthy threshold: 3 2. unhealthy threshold: 3 3. interval: 10 seconds 2. Verify load balancer is working (= you get reply from both instances at different times) ================================================ FILE: topics/aws/exercises/app_load_balancer/solution.md ================================================ ## AWS ELB - Application Load Balancer ### Requirements Two EC2 instances with a simple web application that shows the web page with the string "Hey, it's a me, ``!" ### Objectives 1. Create an application load balancer for the two instances you have, with the following properties 1. healthy threshold: 3 2. unhealthy threshold: 3 3. interval: 10 seconds 2. Verify load balancer is working (= you get reply from both instances at different times) ### Solution #### Console 1. Go to EC2 service 2. Click in the left side menu on "Load balancers" under "Load balancing" 3. Click on "Create load balancer" 4. Choose "Application Load Balancer" 5. Insert a name for the LB 6. Choose an AZ where you want the LB to operate 7. Choose a security group 8. Under "Listeners and routing" click on "Create target group" and choose "Instances" 1. Provide a name for the target group 2. Set healthy threshold to 3 3. Set unhealthy threshold to 3 4. Set interval to 10 seconds 5. Click on "Next" and choose the two of the instances you've created 6. Click on "Create target group" 9. Refresh target groups and choose the one you've just created 10. Click on "Create load balancer" and wait for it to be provisioned 11. Copy DNS address and paste it in the browser. If you refresh, you should see different message based on the instance where the traffic was routed to ================================================ FILE: topics/aws/exercises/asg_dynamic_scaling_policy/exercise.md ================================================ ## AWS Auto Scaling Groups - Dynamic Scaling Policy ### Requirements 1. Existing Auto Scaling Group with maximum capacity set to at least 3 2. One running EC2 instance with max of 4 CPUs ### Objectives 1. Create a dynamic scaling policy with the following properties 1. Track average CPU utilization 2. Target value should be 70% 2. Increase the CPU utilization to at least 70% 1. Do you see change in number of instances? 1. Decrease CPU utilization to less than 70% 1. Do you see change in number of instances? ================================================ FILE: topics/aws/exercises/asg_dynamic_scaling_policy/solution.md ================================================ ## AWS Auto Scaling Groups - Dynamic Scaling Policy ### Requirements 1. Existing Auto Scaling Group with maximum capacity set to at least 3 2. One running EC2 instance with max of 4 CPUs ### Objectives 1. Create a dynamic scaling policy with the following properties 1. Track average CPU utilization 2. Target value should be 70% 2. Increase the CPU utilization to at least 70% 1. Do you see change in number of instances? 1. Decrease CPU utilization to less than 70% 1. Do you see change in number of instances? ### Solution #### Console 1. Go to EC2 service -> Auto Scaling Groups and click on the tab "Automating scaling" 2. Choose "Target tracking scaling" under "Policy Type" 3. Set metric type to Average CPU utilization 4. Set target value to 70% and click on "Create" 1. If you are using Amazon Linux 2, you can stress the instance with the following: ``` sudo amazon-linux-extras install epel -y sudo yum install stress -y stress -c 4 # assuming you have 4 CPUs ``` 2. Yes, additional EC2 instance was added 1. Simply stop the stress command 2. Yes, one of the EC2 instances was terminated ================================================ FILE: topics/aws/exercises/aurora_db/exercise.md ================================================ ## AWS Databases - Aurora DB ### Objectives 1. Create an Aurora database with the following properties * Edition: MySQL * Instance type: db.t3.small * A reader node in a different AZ * Public access should be enabled * Port should be set to 3306 * DB name: 'db' * Backup retention: 10 days 2. How many instances does your DB cluster has? ================================================ FILE: topics/aws/exercises/aurora_db/solution.md ================================================ ## AWS Databases - Aurora DB ### Objectives 1. Create an Aurora database with the following properties * Edition: MySQL * Instance type: db.t3.small * A reader node in a different AZ * Public access should be enabled * Port should be set to 3306 * DB name: 'db' * Backup retention: 10 days 2. How many instances does your DB cluster has? ### Solution #### Console 1. Go to RDS service 2. Click on "Databases" in the left side menu and click on the "Create database" button 3. Choose "standard create" 4. Choose "Aurora DB" 5. Choose "MySQL" edition and "Provisioned" as capacity type 6. Choose "single-master" 7. Specify Credentials (master username and password) 8. Choose DB instance type: Burstable classes, db.t3.small 9. Choose "Create an Aurora Replica or Reader node in a different AZ" 10. Choose a default VPC and subnet 11. Check "Yes" for public access 12. Database port should be 3306 13. For authentication, choose "Password and IAM database authentication" 14. Set initial database name as "db" 15. Increase backup retention period to 10 days 16. Click on "Create database" button 1. Two instances - one reader and one writer ================================================ FILE: topics/aws/exercises/auto_scaling_groups_basics/exercise.md ================================================ ## AWS Auto Scaling Groups - Basics ### Requirements Zero EC2 instances running ### Objectives A. Create a scaling group for web servers with the following properties: * Amazon Linux 2 AMI * t2.micro as the instance type * user data: ``` yum install -y httpd systemctl start httpd systemctl enable httpd ``` B. Were new instances created since you created the auto scaling group? How many? Why? C. Change desired capacity to 2. Did it launch more instances? D. Change back the desired capacity to 1. What is the result of this action? ================================================ FILE: topics/aws/exercises/auto_scaling_groups_basics/solution.md ================================================ ## AWS Auto Scaling Groups - Basics ### Requirements Zero EC2 instances running ### Objectives A. Create a scaling group for web servers with the following properties: * Amazon Linux 2 AMI * t2.micro as the instance type * user data: ``` yum install -y httpd systemctl start httpd systemctl enable httpd ``` B. Were new instances created since you created the auto scaling group? How many? Why? C. Change desired capacity to 2. Did it launch more instances? D. Change back the desired capacity to 1. What is the result of this action? ### Solution #### Console A. 1. Go to EC2 service 2. Click on "Auto Scaling Groups" under "Auto Scaling" 3. Click on "Create Auto Scaling Group" 4. Insert a name 5. Click on "Create a launch template" 1. Insert a name and a version for the template 2. Select an AMI to use (Amazon Linux 2) 3. Select t2.micro instance type 4. Select a key pair 5. Attach a security group 6. Under "Advanced" insert the user data 7. Click on "Create" 6. Choose the launch template you've just created and click on "Next" 7. Choose "Adhere to launch template" 8. Choose in which AZs to launch and click on "Next" 9. Link it to ALB (if you don't have one, create it) 10. Mark ELB health check in addition to EC2. Click on "Next" until you reach the review page and click on "Create auto scaling group" B. One instance was launched to met the criteria of the auto scaling group we've created. The reason it launched only one is due to "Desired capacity" set to 1. C. Change it by going to your auto scaling group -> Details -> Edit -> "2 desired capacity". This should create another instance if only one is running D. Reducing desired capacity back to 1 will terminate one of the instances (assuming 2 are running). ================================================ FILE: topics/aws/exercises/basic_s3_ci/exercise.md ================================================ # Basic CI with S3 ## Objectives 1. Create a new S3 bucket 2. Add to the bucket index.html file and make it a static website 3. Create a GitHub repo and put the index.html there 4. Make sure to connect your AWS account to GitHub 5. Create a CI pipeline in AWS to publish the updated index.html from GitHub every time someone makes a change to the repo, to a specific branch ================================================ FILE: topics/aws/exercises/basic_s3_ci/solution.md ================================================ # Basic CI with S3 ## Objectives 1. Create a new S3 bucket 2. Add to the bucket index.html file and make it a static website 3. Create a GitHub repo and put the index.html there 4. Make sure to connect your AWS account to GitHub 5. Create a CI pipeline in AWS to publish the updated index.html from GitHub every time someone makes a change to the repo, to a specific branch ## Solution ### Manual #### Create S3 bucket 1. Go to S3 service in AWS console 2. Insert bucket name and choose region 3. Uncheck "block public access" to make it public 4. Click on "Create bucket" #### Static website hosting 1. Navigate to the newly created bucket and click on "properties" tab 2. Click on "Edit" in "Static Website Hosting" section 3. Check "Enable" for "Static web hosting" 4. Set "index.html" as index document and "error.html" as error document. #### S3 bucket permissions 1. Click on "Permissions" tab in the newly created S3 bucket 2. Click on Bucket Policy -> Edit -> Policy Generator. Click on "Generate Policy" for "GetObject" 3. Copy the generated policy and go to Permissions tab and replace it with the current policy #### GitHub Source 1. Go to Developers Tools Console and create a new connection (GitHub) #### Create a CI pipeline 1. Go to CodePipeline in AWS console 2. Click on "Create Pipeline" -> Insert a pipeline name -> Click on Next 3. Choose the newly created source (GitHub) under sources 4. Select repository name and branch name 5. Select "AWS CodeBuild" as build provider 6. Select "Managed Image", "standard" runtime and "new service role" 7. In deploy stage choose the newly created S3 bucket and for deploy provider choose "Amazon S3" 8. Review the pipeline and click on "Create pipeline" #### Test the pipeline 1. Clone the project from GitHub 2. Make changes to index.html and commit them (git commit -a) 3. Push the new change, verify that the newly created AWS pipeline was triggered and check the content of the site ================================================ FILE: topics/aws/exercises/budget_setup/exercise.md ================================================ ## AWS - Budget Setup ### Objectives Setup a cost budget in your AWS account based on your needs. ================================================ FILE: topics/aws/exercises/budget_setup/solution.md ================================================ ## AWS - Budget Setup ### Objectives Setup a cost budget in your AWS account based on your needs. ### Solution 1. Go to "Billing" 2. Click on "Budgets" in the menu 3. Click on "Create a budget" 4. Choose "Cost Budget" and click on "Next" 5. Choose the values that work for you. For example, recurring monthly budget with a specific amount 6. Insert a budget name and Click on "Next" 7. Set up an alert but clicking on "Add an alert threshold" 1. Set a threshold (e.g. 75% of budgeted amount) 2. Set an email where a notification will be sent 8. Click on "Next" until you can click on "Create a budget" ================================================ FILE: topics/aws/exercises/create_ami/exercise.md ================================================ ## EC2 - Create an AMI ### Requirements One running EC2 instance ### Objectives 1. Make some changes in the operating system of your instance (create files, modify files, ...) 2. Create an AMI image from running EC2 instance 3. Launch a new instance using the custom AMI you've created ================================================ FILE: topics/aws/exercises/create_ami/solution.md ================================================ ## EC2 - Create an AMI ### Requirements One running EC2 instance ### Objectives 1. Make some changes in the operating system of your instance (create files, modify files, ...) 2. Create an AMI image from running EC2 instance 3. Launch a new instance using the custom AMI you've created ### Solution 1. Connect to your EC2 instance (ssh, console, ...) 2. Make some changes in the operating system 3. Go to EC2 service 4. Right click on the instance where you made some changes -> Image and templates -> Create image 5. Give the image a name and click on "Create image" 6. Launch new instance and choose the image you've just created ================================================ FILE: topics/aws/exercises/create_efs/exercise.md ================================================ ## AWS - Create EFS ### Requirements Two EC2 instances in different availability zones ### Objectives 1. Create an EFS with the following properties 1. Set lifecycle management to 60 days 2. The mode should match a use case of scaling to high levels of throughput and I/O operations per second 2. Mount the EFS in both of your EC2 instances ================================================ FILE: topics/aws/exercises/create_efs/solution.md ================================================ ## AWS - Create EFS ### Requirements Two EC2 instances in different availability zones ### Objectives 1. Create an EFS with the following properties 1. Set lifecycle management to 60 days 2. The mode should match a use case of scaling to high levels of throughput and I/O operations per second 2. Mount the EFS in both of your EC2 instances ### Solution 1. Go to EFS console 2. Click on "Create file system" 3. Create on "customize" 1. Set lifecycle management to "60 days since last access" 2. Set Performance mode to "MAX I/O" due to the requirement of "Scaling to high levels of throughput" 3. Click on "Next" 4. Choose security group to attach (if you don't have any, create one and make sure it has a rule to allow NFS traffic) and click on "Next" until you are able to review and create it 5. SSH into your EC2 instances 1. Run `sudo yum install -y amazon-efs-utils` 2. Run `mkdir efs` 3. If you go to your EFS page and click on "Attach", you can see what ways are there to mount your EFS on your instancess 1. The command to mount the EFS should be similar to `sudo mount -t efs -o tls :/ efs` - copy and paste it in your ec2 instance's OS ================================================ FILE: topics/aws/exercises/create_role/exercise.md ================================================ ## AWS - Create a Role ### Objectives Create a basic role to provide EC2 service with Full IAM access permissions.
In the end, run from the CLI (or CloudShell) the command to verify the role was created. ### Solution 1. Go to AWS console -> IAM 2. Click in the left side menu on "Access Manamgement" -> Roles 3. Click on "Create role" 3. Choose "AWS service" as the type of trusted entity and then choose "EC2" as a use case. Click on "Next" 4. In permissions page, check "IAMFullAccess" and click on "Next" until you get to "Review" page 5. In the "Review" page, give the role a name (e.g. IAMFullAcessEC2), provide a short description and click on "Create role" 6. `aws iam list-roles` will list all the roles in the account, including the one we've just created. ================================================ FILE: topics/aws/exercises/create_role/solution.md ================================================ ## AWS - Create a Role ### Objectives Create a basic role to provide EC2 service with Full IAM access permissions.
In the end, run from the CLI (or CloudShell) the command to verify the role was created. ### Solution 1. Go to AWS console -> IAM 2. Click in the left side menu on "Access Manamgement" -> Roles 3. Click on "Create role" 3. Choose "AWS service" as the type of trusted entity and then choose "EC2" as a use case. Click on "Next" 4. In permissions page, check "IAMFullAccess" and click on "Next" until you get to "Review" page 5. In the "Review" page, give the role a name (e.g. IAMFullAcessEC2), provide a short description and click on "Create role" 6. `aws iam list-roles` will list all the roles in the account, including the one we've just created. ================================================ FILE: topics/aws/exercises/create_spot_instances/exercise.md ================================================ ## AWS EC2 - Spot Instances ### Objectives A. Create two Spot instances using a Spot Request with the following properties: * Amazon Linux 2 AMI * 2 instances as target capacity (at any given point of time) while each one has 2 vCPUs and 3 GiB RAM B. Create a single Spot instance using Amazon Linux 2 and t2.micro ================================================ FILE: topics/aws/exercises/create_spot_instances/solution.md ================================================ ## AWS EC2 - Spot Instances ### Objectives A. Create two Spot instances using a Spot Request with the following properties: * Amazon Linux 2 AMI * 2 instances as target capacity (at any given point of time) while each one has 2 vCPUs and 3 GiB RAM B. Create a single Spot instance using Amazon Linux 2 and t2.micro ### Solution A. Create Spot Fleets: 1. Go to EC2 service 2. Click on "Spot Requests" 3. Click on "Request Spot Instances" button 4. Set the following values for parameters: * Amazon Linux 2 AMI * Total target capacity -> 2 * Check "Maintain target capacity" * vCPUs: 2 * Memory: 3 GiB RAM 5. Click on Launch B. Create a single Spot instance: 1. Go to EC2 service 2. Click on "Instances" 3. Click on "Launch Instances" 4. Choose "Amazon Linux 2 AMI" and click on "Next" 5. Choose t2.micro and click on "Next: Configure Instance Details" 6. Select "Request Spot instances" 7. Set Maximum price above current price 8. Click on "Review and Launch" ================================================ FILE: topics/aws/exercises/create_user/exercise.md ================================================ ## IAM AWS - Create a User ### Objectives As you probably know at this point, it's not recommended to work with the root account in AWS. For this reason you are going to create a new account which you'll use regularly as the admin account. 1. Create a user with password credentials 2. Add the newly created user to a group called "admin" and attach to it the policy called "Administrator Access" 3. Make sure the user has a tag called with the key `Role` and the value `DevOps` ================================================ FILE: topics/aws/exercises/create_user/solution.md ================================================ ## IAM AWS - Create a User ### Objectives As you probably know at this point, it's not recommended to work with the root account in AWS. For this reason you are going to create a new account which you'll use regularly as the admin account. 1. Create a user with password credentials 2. Add the newly created user to a group called "admin" and attach to it the policy called "Administrator Access" 3. Make sure the user has a tag called with the key `Role` and the value `DevOps` ### Solution 1. Go to the AWS IAM service 2. Click on "Users" in the right side menu (right under "Access Management") 3. Click on the button "Add users" 4. Insert the user name (e.g. mario) 5. Select the credential type: "Password" 6. Set console password to custom and click on "Next" 7. Click on "Add user to group" 8. Insert "admin" as group name 9. Check the "AdministratorAccess" policy and click on "Create group" 10. Click on "Next: Tags" 11. Add a tag with the key `Role` and the value `DevOps` 12. Click on "Review" and then create on "Create user" 13. ### Solution using Terraform ``` resource "aws_iam_group_membership" "team" { name = "tf-testing-group-membership" users = [ aws_iam_user.newuser.name, ] group = aws_iam_group.admin.name } resource "aws_iam_group_policy_attachment" "test-attach" { group = aws_iam_group.admin.name policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess" } resource "aws_iam_group" "admin" { name = "admin" } resource "aws_iam_user" "newuser" { name = "newuser" path = "/system/" tags = { Role = "DevOps" } } ``` ================================================ FILE: topics/aws/exercises/creating_records/exercise.md ================================================ ## AWS Route 53 - Creating Records ### Requirements At least one registered domain ### Objectives 1. Create the following record for your domain: 1. Record name: foo 2. Record type: A 3. Set some IP in the value field 2. Verify from the shell that you are able to use the record you've created to lookup for the IP address by using the domain name ================================================ FILE: topics/aws/exercises/creating_records/solution.md ================================================ ## AWS Route 53 - Creating Records ### Requirements At least one registered domain ### Objectives 1. Create the following record for your domain: 1. Record name: foo 2. Record type: A 3. Set some IP in the value field 2. Verify from the shell that you are able to use the record you've created to lookup for the IP address by using the domain name ### Solution 1. Go to Route 53 service -> Hosted zones 2. Click on your domain name 3. Click on "Create record" 4. Insert "foo" in "Record name" 5. Set "Record type" to A 6. In "Value" insert "201.7.20.22" 7. Click on "Create records" 1. In your shell, type `nslookup foo.` or `dig foo. attach volume -> choose your EC2 instance and click on "Attach" 7. Terminate your instance 8. The default EBS volume (created when you launched the instance for the first time) will be deleted (unless you didn't check "Delete on termination"), but the volume you've created as part of this exercise, will remain Note: don't forget to remove the EBS volume you've created in this exercise ================================================ FILE: topics/aws/exercises/ec2_iam_roles/exercise.md ================================================ ## AWS EC2 - IAM Roles ### Requirements 1. Running EC2 instance without any IAM roles (so you if you connect the instance and try to run AWS commands, it fails) 2. IAM role with "IAMReadOnlyAccess" policy ### Objectives 1. Attach a role (and if such role doesn't exists, create it) with "IAMReadOnlyAccess" policy to the EC2 instance 2. Verify you can run AWS commands in the instance ================================================ FILE: topics/aws/exercises/ec2_iam_roles/solution.md ================================================ ## AWS EC2 - IAM Roles ### Requirements 1. Running EC2 instance without any IAM roles (so you if you connect the instance and try to run AWS commands, it fails) 2. IAM role with "IAMReadOnlyAccess" policy ### Objectives 1. Attach a role (and if such role doesn't exists, create it) with "IAMReadOnlyAccess" policy to the EC2 instance 2. Verify you can run AWS commands in the instance ### Solution #### Console 1. Go to EC2 service 2. Click on the instance to which you would like to attach the IAM role 3. Click on "Actions" -> "Security" -> "Modify IAM Role" 4. Choose the IAM role with "IAMReadOnlyAccess" policy and click on "Save" 5. Running AWS commands now in the instance should work fine (e.g. `aws iam list-users`) ================================================ FILE: topics/aws/exercises/ecs_task/exercise.md ================================================ ## AWS Containers - Run Tasks Note: this costs money ### Objectives Create a task in ECS to launch in Fargate. The task itself can be a sample app. ================================================ FILE: topics/aws/exercises/ecs_task/solution.md ================================================ ## AWS Containers - Run Tasks Note: this costs money ### Objectives Create a task in ECS to launch in Fargate. The task itself can be a sample app. ### Solution #### Console 1. Go to Elastic Container Service page 2. Click on "Get Started" 3. Choose "sample-app" 4. Verify it's using Farget and not ECS (EC2 Instance) and click on "Next" 5. Select "None" in Load balancer type and click on "Next" 6. Insert cluster name (e.g. my_cluster) and click on "Next" 7. Review everything and click on "Create" 8. Wait for everything to complete 1. Go to clusters page and check the status of the task (it will take a couple of seconds/minutes before changing to "Running") 1. Click on the task and you'll see the launch type is Fargate ================================================ FILE: topics/aws/exercises/elastic_beanstalk_simple/exercise.md ================================================ ## AWS Elastic Beanstalk - Node.js ### Requirements 1. Having a running node.js application on AWS Elastic Beanstalk platform ### Objectives 1. Create an AWS Elastic Beanstalk application with the basic properties a. No ALB, No Database, Just use the default platform settings ### Out of scope 1. Having ALB attached in place 2. Having custom domain name in place 3. Having automated pipelines in place 4. Having blue-green deployment in place 5. Writing the Node.js application ================================================ FILE: topics/aws/exercises/elastic_beanstalk_simple/solution.md ================================================ ## AWS Elastic Beanstalk - Node.js ### Prerequisites 1. make sure the node.js application has a _npm start_ command specified in the __package.json__ file like the following example ``` { "name": "application-name", "version": "0.0.1", "private": true, "scripts": { "start": "node app" }, "dependencies": { "express": "3.1.0", "jade": "*", "mysql": "*", "async": "*", "node-uuid": "*" } ``` 2. zip the application, and make sure to not zip the parent folder, only the files together, like: ``` \Parent - (exclude the folder itself from the the zip) - file1 - (include in zip) - subfolder1 (include in zip) - file2 (include in zip) - file3 (include in zip) ``` ### Solution 1. Create a "New Environment" 2. Select Environment => _Web Server Environment_ 3. Fill the Create a web server environment section a. Fill the "Application Name" 4. Fill the Environment information section a. Fill the "Environment Name" b. Domain - "Leave for autogenerated value" 5. Platform a. Choose Platform => _node.js_ 6. Application Code => upload the Zipped Code from your local computer 7. Create Environment 8. Wait for the environment to come up 9. Check the website a. Navigate to the _Applications_ tab, b. select the recently created node.js app c. click on the URL - highlighted ### Documentation [Elastic Beanstalk / Node.js getting started](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/nodejs-getstarted.html) ================================================ FILE: topics/aws/exercises/elastic_ip/exercise.md ================================================ ## AWS EC2 - Elastic IP ### Requirements * An EC2 instance with public IP (not elastic IP) ### Objectives 1. Write down the public IP of your EC2 instance somewhere and stop & start the instance. Does the public IP address is the same? why? 2. Handle this situation so you have the same public IP even after stopping and starting the instance ================================================ FILE: topics/aws/exercises/elastic_ip/solution.md ================================================ ## AWS EC2 - Elastic IP ### Requirements * An EC2 instance with public IP (not elastic IP) ### Objectives 1. Write down the public IP of your EC2 instance somewhere and stop & start the instance. Does the public IP address is the same? why? 2. Handle this situation so you have the same public IP even after stopping and starting the instance ### Solution 1. Go to EC2 service -> Instances 1. Write down current public IP address 2. Click on "Instance state" -> Stop instance -> Stop 3. Click on "Instance state" -> Start Instance 4. Yes, the public IP address has changed 2. Let's use an Elastic IP address 1. In EC2 service, under "Network & Security" click on "Elastic IP" 2. Click on the "Allocate elastic IP address" button 3. Make sure you select "Amazon's pool of IPv4 addresses" and click on "Allocate" 4. Click on "Actions" and then "Associate Elastic IP address" 1. Select "instance", choose your instance and provide its private IP address 2. Click on "Associate" 5. Now, if we go back to the instance page, we can see it is using the Elastic IP address as its public IP Note: to remove it, use "disassociate" option and don't forget to also release it so you won't be billed. ================================================ FILE: topics/aws/exercises/elastic_network_interfaces/exercise.md ================================================ ## AWS EC2 - Elastic Network Interfaces ### Requirements * An EC2 instance with network interface ### Objectives A. Create a network interface and attach it to the EC2 instance that already has one network interface B. Explain why would anyone use two network interfaces ================================================ FILE: topics/aws/exercises/elastic_network_interfaces/solution.md ================================================ ## AWS EC2 - Elastic Network Interfaces ### Requirements * An EC2 instance with network interface ### Objectives A. Create a network interface and attach it to the EC2 instance that already has one network interface B. Explain why would anyone use two network interfaces ### Solution A. 1. Go to EC2 service 2. Click on "Network Interfaces" under "Network & Security" 3. Click on "Create network interface" 4. Provide a description 5. Choose a subnet (one that is in the AZ as the instance) 6. Optionally attach a security group and click on "Create network interface" 7. Click on "Actions" -> "Attach" and choose the instance to attach it to 8. If you go now to "Instances" page you'll see your instance has two network interfaces B. 1. You can move the second network interface between instances. This allows us to create kind of a failover mechanism between the instances. ================================================ FILE: topics/aws/exercises/elasticache/exercise.md ================================================ ## AWS ElastiCache ### Objectives 1. Create ElastiCache Redis * Instance type should be "cache.t2.micro" * Replicas should be 0 ================================================ FILE: topics/aws/exercises/elasticache/solution.md ================================================ ## AWS ElastiCache ### Objectives 1. Create ElastiCache Redis * Instance type should be "cache.t2.micro" * Replicas should be 0 ### Solution #### Console 1. Go to ElastiCache service 2. Click on "Get Started Now" 3. Choose "Redis" 4. Insert a name and description 5. Choose "cache.t2.micro" an node type 6. Set number of replicas to 0 7. Create new subnet group 8. Click on "Create" ================================================ FILE: topics/aws/exercises/health_checks/exercise.md ================================================ ## AWS Route 53 - Health Checks ## Requirements 3 web instances in different AZs. ## Objectives 1. For each instance create a health checks with the following properties: 1. Name it after the AZ where the instance resides 2. Failure threshold should be 5 2. Edit the security group of one of your instances and remove HTTP rules. 1. Did it change the status of the health check? ================================================ FILE: topics/aws/exercises/health_checks/solution.md ================================================ ## AWS Route 53 - Health Checks ## Requirements 3 web instances in different AZs. ## Objectives 1. For each instance create a health checks with the following properties: 1. Name it after the AZ where the instance resides 2. Failure threshold should be 5 2. Edit the security group of one of your instances and remove HTTP rules. 1. Did it change the status of the health check? ### Solution #### Console 1. Go to Route 53 2. Click on "Health Checks" in the left-side menu 3. Click on "Create health check" 4. Insert the name: us-east-2 5. What to monitor: endpoint 6. Insert the IP address of the instance 7. Insert the endpoint /health if your web instance supports that endpoint 8. In advanced configuration, set Failure threshold to 5 9. Click on "next" and then on "Create health check" 10. Repeat steps 1-9 for the other two instances you have 1. Go to security group of one of your instances 2. Click on "Actions" -> Edit inbound rules -> Delete HTTP based rules 3. Go back to health checks page and after a couple of seconds you should see that the status becomes "unhealthy" ================================================ FILE: topics/aws/exercises/hello_function/exercise.md ================================================ # Hello Function Create a basic AWS Lambda function that when given a name, will return "Hello " ## Solution Click [here](solution.md) to view the solution. ================================================ FILE: topics/aws/exercises/hello_function/solution.md ================================================ ## Hello Function - Solution ### Exercise Create a basic AWS Lambda function that when given a name, will return "Hello " ### Solution #### Define a function 1. Go to Lambda console panel and click on `Create function` 1. Give the function a name like `BasicFunction` 2. Select `Python3` runtime 3. Now to handle function's permissions, we can attach IAM role to our function either by setting a role or creating a new role. I selected "Create a new role from AWS policy templates" 4. In "Policy Templates" select "Simple Microservice Permissions" 1. Next, you should see a text editor where you will insert a code similar to the following #### Function's code ``` import json def lambda_handler(event, context): firstName = event['name'] return 'Hello ' + firstName ``` 2. Click on "Create Function" #### Define a test 1. Now let's test the function. Click on "Test". 2. Select "Create new test event" 3. Set the "Event name" to whatever you'd like. For example "TestEvent" 4. Provide keys to test ``` { "name": 'Spyro' } ``` 5. Click on "Create" #### Test the function 1. Choose the test event you've create (`TestEvent`) 2. Click on the `Test` button 3. You should see something similar to `Execution result: succeeded` 4. If you'll go to AWS CloudWatch, you should see a related log stream ================================================ FILE: topics/aws/exercises/hibernate_instance/exercise.md ================================================ ## AWS EC2 - Hibernate an Instance ### Objectives 1. Create an instance that supports hibernation 2. Hibernate the instance 3. Start the instance 4. What way is there to prove that instance was hibernated from OS perspective? ================================================ FILE: topics/aws/exercises/hibernate_instance/solution.md ================================================ ## AWS EC2 - Hibernate an Instance ### Objectives 1. Create an instance that supports hibernation 2. Hibernate the instance 3. Start the instance 4. What way is there to prove that instance was hibernated from OS perspective? ### Solution 1. Create an instance that supports hibernation 1. Go to EC2 service 2. Go to instances and create an instance 3. In "Configure instance" make sure to check "Enable hibernation as an additional stop behavior" 4. In "Add storage", make sure to encrypt EBS and make sure the size > instance RAM size (because hibernation saves the RAM state) 5. Review and Launch 2. Hibernate the instance 1. Go to the instance page 2. Click on "Instance state" -> "Hibernate instance" -> Hibernate 3. Instance state -> Start 4. Run the "uptime" command, which will display the amount of time the system was up ================================================ FILE: topics/aws/exercises/launch_ec2_web_instance/exercise.md ================================================ ## AWS - Launch EC2 Web Instance ### Objectives Launch one EC2 instance with the following requirements: 1. Amazon Linux 2 image 2. Instance type: pick up one that has 1 vCPUs and 1 GiB memory 3. Instance storage should be deleted upon the termination of the instance 4. When the instance starts, it should install: 1. Install the httpd package 2. Start the httpd service 3. Make sure the content of /var/www/html/index.html is `I made it! This is is awesome!` 5. It should have the tag: "Type: web" and the name of the instance should be "web-1" 6. HTTP traffic (port 80) should be accepted from anywhere ================================================ FILE: topics/aws/exercises/launch_ec2_web_instance/solution.md ================================================ ## AWS - Launch EC2 Web Instance ### Objectives Launch one EC2 instance with the following requirements: 1. Amazon Linux 2 image 2. Instance type: pick up one that has 1 vCPUs and 1 GiB memory 3. Instance storage should be deleted upon the termination of the instance 4. When the instance starts, it should install: 1. Install the httpd package 2. Start the httpd service 3. Make sure the content of /var/www/html/index.html is `I made it! This is is awesome!` 5. It should have the tag: "Type: web" and the name of the instance should be "web-1" 6. HTTP traffic (port 80) should be accepted from anywhere ### Solution 1. Choose a region close to you 2. Go to EC2 service 3. Click on "Instances" in the menu and click on "Launch instances" 4. Choose image: Amazon Linux 2 5. Choose instance type: t2.micro 6. Make sure "Delete on Termination" is checked in the storage section 7. Under the "User data" field the following: ``` yum update -y yum install -y httpd systemctl start httpd systemctl enable httpd echo "

I made it! This is is awesome!

" > /var/www/html/index.html ``` 8. Add tags with the following keys and values: * key "Type" and the value "web" * key "Name" and the value "web-1" 9. In the security group section, add a rule to accept HTTP traffic (TCP) on port 80 from anywhere 10. Click on "Review" and then click on "Launch" after reviewing. 11. If you don't have a key pair, create one and download it. 12. ### Solution using Terraform ``` provider "aws" { region = "us-east-1" // Or your desired region } resource "aws_instance" "web_server" { ami = "ami-12345678" // Replace with the correct AMI for Amazon Linux 2 instance_type = "t2.micro" // Or any instance type with 1 vCPU and 1 GiB memory tags = { Name = "web-1" Type = "web" } root_block_device { volume_size = 8 // Or any desired size delete_on_termination = true } provisioner "remote-exec" { inline = [ "sudo yum update -y", "sudo yum install -y httpd", "sudo systemctl start httpd", "sudo bash -c 'echo \"I made it! This is awesome!\" > /var/www/html/index.html'", "sudo systemctl enable httpd" ] connection { type = "ssh" user = "ec2-user" private_key = file("~/.ssh/your_private_key.pem") // Replace with the path to your private key host = self.public_ip } } security_group_ids = [aws_security_group.web_sg.id] } resource "aws_security_group" "web_sg" { name = "web_sg" description = "Security group for web server" ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } } ``` ================================================ FILE: topics/aws/exercises/mysql_db/exercise.md ================================================ ## AWS Databases - MySQL DB ### Objectives 1. Create a MySQL database with the following properties * Instance type: db.t2.micro * gp2 storage * Storage Auto scaling should be enabled and threshold should be set to 500 GiB * Public access should be enabled * Port should be set to 3306 * DB name: 'db' * Backup retention: 10 days 2. Create read replica for the database you've created ================================================ FILE: topics/aws/exercises/mysql_db/solution.md ================================================ ## AWS Databases - MySQL DB ### Objectives 1. Create a MySQL database with the following properties * Instance type: db.t2.micro * gp2 storage * Storage Auto scaling should be enabled and threshold should be set to 500 GiB * Public access should be enabled * Port should be set to 3306 * DB name: 'db' * Backup retention: 10 days 2. Create read replica for the database you've created ### Solution #### Console 1. Go to RDS service 2. Click on "Databases" in the left side menu and click on the "Create database" button 3. Choose "standard create" 4. Choose "MySQL" and the recommended version 5. Choose "Production" template 6. Specify DB instance identifier 7. Specify Credentials (master username and password) 8. Choose DB instance type: Burstable classes, db.t2.micro 9. Choose "gp2" as storage 10. Enable storage autoscalling: maximum storage threshold of 500 GiB 11. Choose "Do not create a standby instance" 12. Choose a default VPC and subnet 12. Check "Yes" for public access 13. Choose "No preference" for AZ 14. Database port should be 3306 15. For authentication, choose "Password and IAM database authentication" 16. Set initial database name as "db" 17. Increase backup retention period to 10 days 18. Click on "Create database" button 1. Go to the database under "Databases" in the left side menu 2. Click on "Actions" -> Create read replica 3. Click on "Create read replica" ================================================ FILE: topics/aws/exercises/network_load_balancer/exercise.md ================================================ ## AWS ELB - Network Load Balancer ### Requirements Two running EC2 instances ### Objectives 1. Create a network load balancer 1. healthy threshold: 3 2. unhealthy threshold: 3 3. interval: 10 seconds 4. Listener should be using TCP protocol on port 80 ================================================ FILE: topics/aws/exercises/network_load_balancer/solution.md ================================================ ## AWS ELB - Network Load Balancer ### Requirements Two running EC2 instances ### Objectives 1. Create a network load balancer 1. healthy threshold: 3 2. unhealthy threshold: 3 3. interval: 10 seconds 4. Listener should be using TCP protocol on port 80 ### Solution #### Console 1. Go to EC2 service 2. Click in the left side menu on "Load balancers" under "Load balancing" 3. Click on "Create load balancer" 4. Choose "Network Load Balancer" 5. Insert a name for the LB 6. Choose AZs where you want the LB to operate 7. Choose a security group 8. Under "Listeners and routing" click on "Create target group" and choose "Instances" 1. Provide a name for the target group 2. Set healthy threshold to 3 3. Set unhealthy threshold to 3 4. Set interval to 10 seconds 5. Set protocol to TCP and port to 80 6. Click on "Next" and choose two instances you have 7. Click on "Create target group" 9. Refresh target groups and choose the one you've just created 10. Click on "Create load balancer" and wait for it to be provisioned ================================================ FILE: topics/aws/exercises/new_vpc/exercise.md ================================================ # My First VPC ## Objectives 1. Create a new VPC 1. It should have a CIDR that supports using at least 60,000 hosts 2. It should be named "exercise-vpc" ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/aws/exercises/new_vpc/main.tf ================================================ ================================================ FILE: topics/aws/exercises/new_vpc/pulumi/__main__.py ================================================ import pulumi import pulumi_awsx as awsx vpc = awsx.ec2.Vpc("exercise-vpc", cidr_block="10.0.0.0/16") pulumi.export("vpc_id", vpc.vpc_id) pulumi.export("publicSubnetIds", vpc.public_subnet_ids) pulumi.export("privateSubnetIds", vpc.private_subnet_ids) # Run 'pulumi up' to create it ================================================ FILE: topics/aws/exercises/new_vpc/solution.md ================================================ # My First VPC ## Objectives 1. Create a new VPC 1. It should have a CIDR that supports using at least 60,000 hosts 2. It should be named "exercise-vpc" ## Solution ### Console 1. Under "Virtual Private Cloud" click on "Your VPCs" 2. Click on "Create VPC" 3. Insert a name - "exercise-vpc" 4. Insert IPv4 CIDR block: 10.0.0.0/16 5. Keep "Tenancy" at Default 6. Click on "Create VPC" ### Terraform Click [here](terraform/main.tf) to view the solution ### Pulumi - Python Click [here](pulumi/__main__.py) to view the solution ### Verify Solution To verify you've create the VPC, you can run: `aws ec2 describe-vpcs -filters Name=tag:Name,Values=exercise-vpc` ================================================ FILE: topics/aws/exercises/new_vpc/terraform/main.tf ================================================ resource "aws_vpc" "exercise-vpc" { cidr_block = "10.0.0.0/16" tags = { Name = "exercise-vpc" } } output "vpc-id" { value = aws_vpc.exercise-vpc.id } ================================================ FILE: topics/aws/exercises/no_application/exercise.md ================================================ ## No Application :'( ### Objectives Explain what might be possible reasons for the following issues: 1. Getting "time out" when trying to reach an application running on EC2 instance 2. Getting "connection refused" error ================================================ FILE: topics/aws/exercises/no_application/solution.md ================================================ ## No Application :'( ### Objectives Explain what might be possible reasons for the following issues: 1. Getting "time out" when trying to reach an application running on EC2 instance 2. Getting "connection refused" error ### Solution 1. 'Time out' Can be due to one of the following: * Security group doesn't allow access * No host (yes, I know. Not the first thing to check and yet...) * Operating system firewall blocking traffic 2. 'Connection refused' can happen due to one of the following: * Application didn't launch properly or has some issue (doesn't listens on the designated port) * Firewall replied with a reject instead of dropping the packets ================================================ FILE: topics/aws/exercises/password_policy_and_mfa/exercise.md ================================================ ## AWS IAM - Password Policy & MFA Note: DON'T perform this exercise unless you understand what you are doing and what is the outcome of applying these changes to your account ### Objectives 1. Create password policy with the following settings: 1. At least minimum 8 characters 2. At least one number 3. Prevent password reuse 2. Then enable MFA for the account. ================================================ FILE: topics/aws/exercises/password_policy_and_mfa/solution.md ================================================ ## AWS IAM - Password Policy & MFA Note: DON'T perform this exercise unless you understand what you are doing and what is the outcome of applying these changes to your account ### Objectives 1. Create password policy with the following settings: 1. At least minimum 8 characters 2. At least one number 3. Prevent password reuse 2. Then enable MFA for the account. ### Solution Password Policy: 1. Go to IAM service in AWS 2. Click on "Account settings" under "Access management" 3. Click on "Change password policy" 1. Check "Enforce minimum password length" and set it to 8 characters 1. Check "Require at least one number" 1. Check "Prevent password reuse" 4. Click on "Save changes" MFA: 1. Click on the account name 2. Click on "My Security Credentials" 3. Expand "Multi-factor authentication (MFA)" and click on "Activate MFA" 4. Choose one of the devices 5. Follow the instructions to set it up and click on "Assign MFA" 6. ### Solution using Terraform: ``` resource "aws_iam_account_password_policy" "strict" { minimum_password_length = 8 require_numbers = true allow_users_to_change_password = true password_reuse_prevention = 1 } ``` **Note:** You cannot add MFA through terraform, you have to do it in the GUI. ================================================ FILE: topics/aws/exercises/placement_groups/exercise.md ================================================ ## AWS EC2 - Placement Groups ### Objectives A. Create a placement group. It should be one with a low latency network. Make sure to launch an instance as part of this placement group. B. Create another placement group. This time high availability is a priority ================================================ FILE: topics/aws/exercises/placement_groups/solution.md ================================================ ## AWS EC2 - Placement Groups ### Objectives A. Create a placement group. It should be one with a low latency network. Make sure to launch an instance as part of this placement group. B. Create another placement group. This time high availability is a priority ### Solution A. 1. Go to EC2 service 2. Click on "Placement Groups" under "Network & Security" 3. Click on "Create placement group" 4. Give it a name and choose the "Cluster" placement strategy because the requirement is low latency network 5. Click on "Create group" 6. Go to "Instances" and click on "Launch an instance". Choose any properties you would like, just make sure to check "Add instance to placement group" and choose the placement group you've created B. 1. Go to EC2 service 2. Click on "Placement Groups" under "Network & Security" 3. Click on "Create placement group" 4. Give it a name and choose the "Spread" placement strategy because the requirement is high availability as top priority 5. Click on "Create group" ================================================ FILE: topics/aws/exercises/register_domain/exercise.md ================================================ ## AWS Route 53 - Register Domain ### Objectives Note: registering domain costs money. Don't do this exercise, unless you understand that you are going to register a domain and it's going to cost you money. 1. Register your own custom domain using AWS Route 53 2. What is the type of your domain? 3. How many records your domain has? ================================================ FILE: topics/aws/exercises/register_domain/solution.md ================================================ ## AWS Route 53 - Register Domain ### Objectives Note: registering domain costs money. Don't do this exercise, unless you understand that you are going to register a domain and it's going to cost you money. 1. Register your own custom domain using AWS Route 53 2. What is the type of your domain? 3. How many records your domain has? ### Solution 1. Go to Route 53 service page 2. Click in the menu on "Registered Domains" under "Domains" 3. Click on "Register Domain" 4. Insert your domain 5. Check if it's available. If it is, add it to the cart Note: registering domain costs money. Don't click on "continue", unless you understand that you are going to register a domain and it's going to cost you money. 6. Click on "Continue" and fill in your contact information 7. Choose if you want to renew it in the future automatically. Accept the terms and click on "Complete Order" 8. Go to hosted zones and you should see there your newly registered domain 1. The domain type is "Public" 1. The domain has 2 DNS records: NS and SOA ================================================ FILE: topics/aws/exercises/route_53_failover/exercise.md ================================================ ## AWS Route 53 - Failover ### Requirements A running EC2 web instance with an health check defined for it in Route 53 ### Objectives 1. Create a failover record that will failover to another record if an health check isn't passing 1. Make sure TTL is 30 2. Associate the failover record with the health check you have ================================================ FILE: topics/aws/exercises/route_53_failover/solution.md ================================================ ## AWS Route 53 - Failover ### Requirements A running EC2 web instance with an health check defined for it in Route 53 ### Objectives 1. Create a failover record that will failover to another record if an health check isn't passing 1. Make sure TTL is 30 2. Associate the failover record with the health check you have ### Solution #### Console 1. Go to Route 53 service 2. Click on "Hosted Zones" in the left-side menu 3. Click on your hosted zone 4. Click on "Created record" 5. Insert "failover" in record name and set record type to A 6. Insert the IP of your instance 7. Set the routing policy to failover 8. Set TTL to 30 9. Associate with an health check 10. Add another record with the same properties as the previous one 11. Click on "Create records" 12. Go to your EC2 instance and edit its security group to remove the HTTP rules 13. Use your web app and if you print the hotsname of your instance then you will notice, a failover was performed and a different EC2 instance is used ================================================ FILE: topics/aws/exercises/s3/new_bucket/exercise.md ================================================ # Create buckets ## Objectives 1. Create the following buckets: 1. Private bucket 1. eu-west-2 region 2. Upload a single file to the bucket. Any file. 2. Public bucket 1. eu-west-1 region 2. Versioning should be enabled ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/aws/exercises/s3/new_bucket/pulumi/__main__.py ================================================ import pulumi_aws as aws # Private Bucket private_bucket = aws.s3.Bucket("my-first-private-bucket", acl="private", tags={ "Environment": "Exercise", "Name": "My First Private Bucket"}, region="eu-west-2" ) # Bucket Object aws.s3.BucketObject("bucketObject", key="some_object_key", bucket=private_bucket.id, content="object content") # Public Bucket aws.s3.Bucket("my-first-public-bucket", acl="public-read", tags={ "Environment": "Exercise", "Name": "My First Public Bucket"}, region="eu-west-1", versioning=aws.s3.BucketVersioningArgs(enabled=True)) ================================================ FILE: topics/aws/exercises/s3/new_bucket/solution.md ================================================ # Create buckets ## Objectives 1. Create the following buckets: 1. Private bucket 1. eu-west-2 region 2. Upload a single file to the bucket. Any file. 2. Public bucket 1. eu-west-1 region 2. Versioning should be enabled ## Solution ### Console For the first bucket: 1. Go to S3 service in the AWS console. If not in buckets page, click on "buckets" in the left side menu 2. Click on "Create bucket" 3. Give a globally unique name for your bucket 4. Choose the region "eu-west-2" 5. Click on "Create bucket" 6. Click on the bucket name 7. Under "objects" click on "Upload" -> "Add files" -> Choose file to upload -> Click on "Upload" For the second bucket: 1. Go to S3 service in the AWS console. If not in buckets page, click on "buckets" in the left side menu 2. Click on "Create bucket" 3. Give a globally unique name for your bucket 4. Choose the region "eu-west-1" 5. Make sure to uncheck the box for "Private bucket" to make it public 6. Make sure to check the enable box for "Bucket Versioning" 7. Click on "Create bucket" ### Terraform Click [here](terraform/main.tf) to view the solution ### Pulumi - Python Click [here](pulumi/__main__.py) to view the solution ================================================ FILE: topics/aws/exercises/s3/new_bucket/terraform/main.tf ================================================ resource "aws_s3_bucket" "private_bucket" { bucket = "my-first-private-bucket" region = "eu-west-2" acl = "private" tags = { Name = "My First Private Bucket" Environment = "Exercise" } } resource "aws_s3_bucket_acl" "private_bucket_acl" { bucket = aws_s3_bucket.private_bucket.id acl = "private" } resource "aws_s3_bucket" "public_bucket" { bucket = "my-first-public-bucket" region = "eu-west-1" tags = { Name = "My First Public Bucket" Environment = "Exercise" } versioning { enabled = true } } resource "aws_s3_bucket_acl" "public_bucket_acl" { bucket = aws_s3_bucket.public_bucket.id acl = "public-read" } resource "aws_s3_bucket_object" "bucket_object" { bucket = "my-first-private-bucket" key = "some_object_key" content = "object content" } ================================================ FILE: topics/aws/exercises/sample_cdk/exercise.md ================================================ ### Set up a CDK Project Initialize a CDK project and set up files required to build a CDK project. ## Solution Click [here](solution.md) to view the solution. ================================================ FILE: topics/aws/exercises/sample_cdk/solution.md ================================================ ### Set up a CDK Project - Solution ### Exercise Initialize a CDK project and set up files required to build a CDK project. ### Solution #### Initialize a CDK project 1. Install CDK on your machine by running `npm install -g aws-cdk`. 2. Create a new directory named `sample` for your project and run `cdk init app --language typescript` to initialize a CDK project. You can choose language as csharp, fsharp, go, java, javascript, python or typescript. 3. You would see the following files created in your directory: 1. `cdk.json`, `tsconfig.json`, `package.json` - These are configuration files that are used to define some global settings for your CDK project. 2. `bin/sample.ts` - This is the entry point for your CDK project. This file is used to define the stack that you want to create. 3. `lib/sample-stack.ts` - This is the main file that will contain the code for your CDK project. #### Create a Sample lambda function 1. In `lib/sample-stack.ts` file, add the following code to create a lambda function: ```typescript import * as cdk from 'aws-cdk-lib'; import * as lambda from 'aws-cdk-lib/aws-lambda'; import { Construct } from 'constructs'; export class SampleStack extends cdk.Stack { constructor(scope: Construct, id: string, props?: cdk.StackProps) { super(scope, id, props); const hello = new lambda.Function(this, 'SampleLambda', { runtime: lambda.Runtime.NODEJS_14_X, code: lambda.Code.fromInline('exports.handler = async () => "hello world";'), handler: 'index.handler' }); } } ``` This will create a sample lambda function that returns "hello world" when invoked. #### Bootstrap the CDK project Before you deploy your project. You need to bootstrap your project. This will create a CloudFormation stack that will be used to deploy your project. You can bootstrap your project by running `cdk bootstrap`. Learn more about bootstrapping [here](https://docs.aws.amazon.com/cdk/latest/guide/bootstrapping.html). ##### Deploy the Project 1. Run `npm install` to install all the dependencies for your project whenever you make changes. 2. Run `cdk synth` to synthesize the CloudFormation template for your project. You will see a new file called `cdk.out/CDKToolkit.template.json` that contains the CloudFormation template for your project. 3. Run `cdk diff` to see the changes that will be made to your AWS account. You will see a new stack called `SampleStack` that will create a lambda function and all the changes associated with it. 4. Run `cdk deploy` to deploy your project. You should see a new stack called created in your AWS account under CloudFormation. 5. Go to Lambda console and you will see a new lambda function called `SampleLambda` created in your account. ================================================ FILE: topics/aws/exercises/security_groups/exercise.md ================================================ ## AWS EC2 - Security Groups ### Requirements For this exercise you'll need: 1. EC2 instance with web application 2. Security group inbound rules that allow HTTP traffic ### Objectives 1. List the security groups you have in your account, in the region you are using 2. Remove the HTTP inbound traffic rule 3. Can you still access the application? What do you see/get? 4. Add back the rule 5. Can you access the application now? ## Solution Click [here to view to solution](solution.md) ================================================ FILE: topics/aws/exercises/security_groups/solution.md ================================================ ## AWS EC2 - Security Groups ### Requirements For this exercise you'll need: 1. EC2 instance with web application 2. Security group inbound rules that allow HTTP traffic ### Objectives 1. List the security groups you have in your account, in the region you are using 2. Remove the HTTP inbound traffic rule 3. Can you still access the application? What do you see/get? 4. Add back the rule 5. Can you access the application now? ### Solution #### Console 1. Go to EC2 service - > Click on "Security Groups" under "Network & Security" You should see at least one security group. One of them is called "default" 2. Click on the security group with HTTP rules and click on "Edit inbound rules". Remove the HTTP related rules and click on "Save rules" 3. No. There is a time out because we removed the rule allowing HTTP traffic. 4. Click on the security group -> edit inbound rules and add the following rule: * Type: HTTP * Port range: 80 * Source: Anywhere -> 0.0.0.0/0 5. yes #### CLI 1. `aws ec2 describe-security-groups` -> by default, there is one security group called "default", in a new account 2. Remove the rule: ``` aws ec2 revoke-security-group-ingress \ --group-name someHTTPSecurityGroup --protocol tcp \ --port 80 \ --cidr 0.0.0.0/0 ``` 3. No. There is a time out because we removed the rule allowing HTTP traffic. 4. Add the rule we remove: ``` aws ec2 authorize-security-group-ingress \ --group-name someHTTPSecurityGroup --protocol tcp \ --port 80 \ --cidr 0.0.0.0/0 ``` 5. yes ================================================ FILE: topics/aws/exercises/snapshots/exercise.md ================================================ ## AWS EC2 - EBS Snapshots ### Requirements EBS Volume ### Objectives A. Create a snapshot of an EBS volume B. Verify the snapshot was created C. Move the data to another region D. Create a volume out of it in a different AZ ## Solution Click [here to view to solution](solution.md) ================================================ FILE: topics/aws/exercises/snapshots/solution.md ================================================ ## AWS EC2 - EBS Snapshots ### Requirements EBS Volume ### Objectives A. Create a snapshot of an EBS volume B. Verify the snapshot was created C. Move the data to another region D. Create a volume out of it in a different AZ ### Solution A. 1. Go to EC2 service 2. Click on "Volumes" under "Elastic Block Store" 3. Right click on the chosen volume -> Create snapshot 4. Insert a description and click on "Create Snapshot" B. 1. Click on "Snapshots" under "Elastic Block Store" 2. You should see the snapshot you've created C. 1. Select the snapshot and click on Actions -> Copy 2. Select a region to where the snapshot will be copied D. 1. Select the snapshot and click on Actions -> Create volume 2. Choose a different AZ 3. Click on "Create Volume" ================================================ FILE: topics/aws/exercises/subnets/exercise.md ================================================ ## AWS VPC - Subnets ### Requirements 1. Single newly created VPC 2. Region with more than two availability zones ### Objectives 1. Create a subnet in your newly created VPC 1. CIDR: 10.0.0.0/24 2. Name: NewSubnet1 2. Create additional subnet 1. CIDR: 10.0.1.0/24 2. Name: NewSubnet2 3. Different AZ compared to previous subnet 3. Create additional subnet 1. CIDR: 10.0.2.0/24 2. Name: NewSubnet3 3. Different AZ compared to previous subnets ## Solution Click [here to view to solution](solution.md) ================================================ FILE: topics/aws/exercises/subnets/pulumi/__main__.py ================================================ import pulumi_aws as aws availableZones = aws.get_availability_zones(state="available") aws.ec2.Subnet("NewSubnet1", vpc_id=aws.vpc["main"]["id"], cidr_block="10.0.0.0/24", availability_zone=availableZones.names[0], tags={"Name": "NewSubnet1"} ) aws.ec2.Subnet("NewSubnet2", vpc_id=aws.vpc["main"]["id"], cidr_block="10.0.1.0/24", availability_zone=availableZones.names[1], tags={"Name": "NewSubnet2"} ) aws.ec2.Subnet("NewSubnet3", vpc_id=aws.vpc["main"]["id"], cidr_block="10.0.2.0/24", availability_zone=availableZones.names[2], tags={"Name": "NewSubnet3"} ) # Run "pulumi up" ================================================ FILE: topics/aws/exercises/subnets/solution.md ================================================ # AWS VPC - Subnets ## Requirements 1. Single newly created VPC 2. Region with more than two availability zones ## Objectives 1. Create a subnet in your newly created VPC 1. CIDR: 10.0.0.0/24 1. Name: NewSubnet1 2. Create additional subnet 1. CIDR: 10.0.1.0/24 2. Name: NewSubnet2 3. Different AZ compared to previous subnet 3. Create additional subnet 4. CIDR: 10.0.2.0/24 5. Name: NewSubnet3 6. Different AZ compared to previous subnets ## Solution ### Console 1. Click on "Subnets" under "Virtual Private Cloud" 2. Make sure you filter by your newly created VPC (to not see the subnets in all other VPCs). You can do this in the left side menu 3. Click on "Create subnet" 4. Choose your newly created VPC 5. Set the subnet name to "NewSubnet1" 6. Choose AZ 7. Set CIDR to 10.0.0.0/24 8. Click on "Add new subnet" 9. Set the subnet name to "NewSubnet2" 10. Choose a different AZ 11. Set CIDR to 10.0.1.0/24 12. Click on "Add new subnet" 13. Set the subnet name to "NewSubnet3" 14. Choose a different AZ 15. Set CIDR to 10.0.2.0/24 ### Terraform Click [here](terraform/main.tf) to view the solution ### Pulumi - Python Click [here](pulumi/__main__.py) to view the solution ================================================ FILE: topics/aws/exercises/subnets/terraform/main.tf ================================================ # Variables variable "vpc_id" { type = string } # AWS Subnets resource "aws_subnet" "NewSubnet1" { cidr_block = "10.0.0.0/24" vpc_id = var.vpc_id availability_zone = data.aws_availability_zones.all.names[0] tags = { Purpose: exercise Name: "NewSubnet1" } } resource "aws_subnet" "NewSubnet2" { cidr_block = "10.0.1.0/24" vpc_id = var.vpc_id availability_zone = data.aws_availability_zones.all.names[1] tags = { Purpose: exercise Name: "NewSubnet2" } } resource "aws_subnet" "NewSubnet3" { cidr_block = "10.0.2.0/24" vpc_id = var.vpc_id availability_zone = data.aws_availability_zones.all.names[2] tags = { Purpose: exercise Name: "NewSubnet3" } } # Outputs output "NewSubnet1-id" { value = aws_subnet.NewSubnet1.id } output "NewSubnet2-id" { value = aws_subnet.NewSubnet2.id } output "NewSubnet3-id" { value = aws_subnet.NewSubnet3.id } ================================================ FILE: topics/aws/exercises/url_function/exercise.md ================================================ ## URL Function Create a basic AWS Lambda function that will be triggered when you enter a URL in the browser ## Solution Click [here to view to solution](solution.md) ================================================ FILE: topics/aws/exercises/url_function/solution.md ================================================ ## URL Function Create a basic AWS Lambda function that will be triggered when you enter a URL in the browser ### Solution #### Define a function 1. Go to Lambda console panel and click on `Create function` 1. Give the function a name like `urlFunction` 2. Select `Python3` runtime 3. Now to handle function's permissions, we can attach IAM role to our function either by setting a role or creating a new role. I selected "Create a new role from AWS policy templates" 4. In "Policy Templates" select "Simple Microservice Permissions" 1. Next, you should see a text editor where you will insert a code similar to the following #### Function's code ``` import json def lambda_handler(event, context): firstName = event['name'] return 'Hello ' + firstName ``` 2. Click on "Create Function" #### Define a test 1. Now let's test the function. Click on "Test". 2. Select "Create new test event" 3. Set the "Event name" to whatever you'd like. For example "TestEvent" 4. Provide keys to test ``` { "name": 'Spyro' } ``` 5. Click on "Create" #### Test the function 1. Choose the test event you've create (`TestEvent`) 2. Click on the `Test` button 3. You should see something similar to `Execution result: succeeded` 4. If you'll go to AWS CloudWatch, you should see a related log stream #### Define a trigger We'll define a trigger in order to trigger the function when inserting the URL in the browser 1. Go to "API Gateway console" and click on "New API Option" 2. Insert the API name, description and click on "Create" 3. Click on Action -> Create Resource 4. Insert resource name and path (e.g. the path can be /hello) and click on "Create Resource" 5. Select the resource we've created and click on "Create Method" 6. For "integration type" choose "Lambda Function" and insert the lambda function name we've given to the function we previously created. Make sure to also use the same region 7. Confirm settings and any required permissions 8. Now click again on the resource and modify "Body Mapping Templates" so the template includes this: ``` { "name": "$input.params('name')" } ``` 9. Finally save and click on Actions -> Deploy API #### Running the function 1. In the API Gateway console, in stages menu, select the API we've created and click on the GET option 2. You'll see an invoke URL you can click on. You might have to modify it to include the input so it looks similar to this: `.../hello?name=mario` 3. You should see in your browser `Hello Mario` ================================================ FILE: topics/aws/exercises/web_app_lambda_dynamodb/exercise.md ================================================ # Web App with DB ## Objectives Implement the following architecture: ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/aws/exercises/web_app_lambda_dynamodb/terraform/main.tf ================================================ provider "aws" { region = "us-west-1" } resource "aws_dynamodb_table" "users" { name = "users" hash_key = "id" attribute { name = "id" type = "S" } attribute { name = "login" type = "S" } global_secondary_index { hash_key = name = projection_type = } } ================================================ FILE: topics/azure/README.md ================================================ # Azure - [Azure](#azure) - [Questions](#questions) - [Azure 101](#azure-101) - [Azure Resource Manager](#azure-resource-manager) - [Compute](#compute) - [Network](#network) - [Storage](#storage) - [Security](#security) ## Questions ### Azure 101
What is Azure Portal?
[Microsoft Docs](https://docs.microsoft.com/en-us/learn/modules/intro-to-azure-fundamentals/what-is-microsoft-azure): "The Azure portal is a web-based, unified console that provides an alternative to command-line tools. With the Azure portal, you can manage your Azure subscription by using a graphical user interface."
What is Azure Marketplace?
[Microsoft Docs](https://docs.microsoft.com/en-us/learn/modules/intro-to-azure-fundamentals/what-is-microsoft-azure): "Azure marketplace helps connect users with Microsoft partners, independent software vendors, and startups that are offering their solutions and services, which are optimized to run on Azure."
Explain availability sets and availability zones
An availability set is a logical grouping of VMs that allows Azure to understand how your application is built to provide redundancy and availability. It is recommended that two or more VMs are created within an availability set to provide for a highly available application and to meet the 99.95% Azure SLA.
What is Azure Policy?
[Microsoft Learn](https://learn.microsoft.com/en-us/azure/governance/policy/overview): "Azure Policy helps to enforce organizational standards and to assess compliance at-scale. Through its compliance dashboard, it provides an aggregated view to evaluate the overall state of the environment, with the ability to drill down to the per-resource, per-policy granularity. It also helps to bring your resources to compliance through bulk remediation for existing resources and automatic remediation for new resources."
Explain Azure managed disks
### Azure Resource Manager
Explain what's Azure Resource Manager
From [Azure docs](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview): "Azure Resource Manager is the deployment and management service for Azure. It provides a management layer that enables you to create, update, and delete resources in your Azure account. You use management features, like access control, locks, and tags, to secure and organize your resources after deployment."
What are the ARM template's sections ?
[Microsoft Learn](https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/overview): The template has the following sections: Parameters - Provide values during deployment that allow the same template to be used with different environments. Variables - Define values that are reused in your templates. They can be constructed from parameter values. User-defined functions - Create customized functions that simplify your template. Resources - Specify the resources to deploy. Outputs - Return values from the deployed resources.
What's an Azure Resource Group?
From [Azure docs](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal): "A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group."
### Compute
What Azure compute services are you familiar with?
* Azure Virtual Machines * Azure Batch * Azure Service Fabric * Azure Container Instances * Azure Virtual Machine Scale Sets
What "Azure Virtual Machines" service is used for?
Azure VMs support Windows and Linux OS. They can be used for hosting web servers, applications, backups, Databases, they can also be used as jump server or azure self-hosted agent for building and deploying apps.
What "Azure Virtual Machine Scale Sets" service is used for?
Scaling Linux or Windows virtual machines; it lets you create and manage a group of load balanced VMs. The number of VM instances can automatically increase or decrease in response to demand or a defined schedule.
What "Azure Functions" service is used for?
Azure Functions is the serverless compute service of Azure.
What "Durable Azure Function" are?
[Microsoft Learn](https://docs.microsoft.com/en-us/learn/modules/intro-to-azure-fundamentals/what-is-microsoft-azure): Durable Functions is an extension of Azure Functions that lets you write stateful functions in a serverless compute environment.
What "Azure Container Instances" service is used for?
Running containerized applications (without the need to provision virtual machines).
What "Azure Batch" service is used for?
Running parallel and high-performance computing applications
What "Azure Service Fabric" service is used for?
What "Azure Kubernetes" service is used for?
### Network
What Azure network services are you familiar with?
Explain VNet peering
VNet peering enables connecting virtual networks. This means that you can route traffic between resources of the connected VNets privately through IPv4 addresses. Connecting VNets within the same region is known as regional VNet Peering, however connecting VNets across Azure regions is known as global VNet Peering.
What's an Azure region?
An Azure region is a set of datacenters deployed within an interval-defined and connected through a dedicated regional low-latency network.
What is the N-tier architecture?
N-tier architecture divides an application into logical layers and physical tiers. Each layer has a specific responsibility. Tiers are physically separated, running on separate machines. An N-tier application can have a closed layer architecture or an open layer architecture. N-tier architectures are typically implemented as infrastructure-as-service (IaaS) applications, with each tier running on a separate set of VMs
### Storage
What Azure storage services are you familiar with?
What storage options Azure supports?
### Security
What is the Azure Security Center? What are some of its features?
It's a monitoring service that provides threat protection across all of the services in Azure. More specifically, it: * Provides security recommendations based on your usage * Monitors security settings and continuously all the services * Analyzes and identifies potential inbound attacks * Detects and blocks malware using machine learning
What is Azure Active Directory?
Azure AD is a cloud-based identity service. You can use it as a standalone service or integrate it with existing Active Directory service you already running.
What is Azure Advanced Threat Protection?
What components are part of Azure ATP?
Where logs are stored in Azure Monitor?
Explain Azure Site Recovery
Explain what the advisor does
Which protocols are available for configuring health probe
Explain Azure Active
What is a subscription? What types of subscriptions are there?
Explain what is a blob storage service
================================================ FILE: topics/chaos_engineering/README.md ================================================ # Chaos Engineering - [Chaos Engineering](#chaos-engineering) - [Chaos Engineering Questions](#chaos-engineering-questions) - [Basics](#basics) ## Chaos Engineering Questions ### Basics
What is Chaos Engineering?
[Wikipedia](https://en.wikipedia.org/wiki/Chaos_engineering): "Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system's capability to withstand turbulent conditions in production." [TechTarget](https://www.techtarget.com/searchitoperations/definition/chaos-engineering): "Chaos engineering is the process of testing a distributed computing system to ensure that it can withstand unexpected disruptions."
What's a typical Chaos Engineering workflow?
According to [Gremlin](gremlin.com) there are three steps: 1. Planning an experiment where you design and choose a scenario in which your system should fail to operate properly 2. You execute the smallest possible experiment to test your theory 3. If nothing goes wrong, you scale your experiment and make the blast radius bigger. If your system breaks, you better understand why and start dealing with it The process then repeats itself either with same scenario or a new one.
Cite a few tools used to operate Chaos exercises
- AWS Fault Injection Simulator: inject failures in AWS resources - Azure Chaos Studio: inject failures in Azure resources - Chaos Monkey: one of the most famous tools to orchestrate Chaos on diverse Cloud providers - Litmus - A Framework for Kubernetes - Chaos Mesh: for Cloud Kubernetes platforms See an extensive list [here](https://github.com/dastergon/awesome-chaos-engineering)
================================================ FILE: topics/cicd/README.md ================================================ ## CI/CD ### CI/CD Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Set up a CI pipeline | CI | [Exercise](ci_for_open_source_project.md) | | | | Deploy to Kubernetes | Deployment | [Exercise](deploy_to_kubernetes.md) | [Solution](solutions/deploy_to_kubernetes/README.md) | | | Jenkins - Remove Jobs | Jenkins Scripts | [Exercise](remove_jobs.md) | [Solution](solutions/remove_jobs_solution.groovy) | | | Jenkins - Remove Builds | Jenkins Scripts | [Exercise](remove_builds.md) | [Solution](solutions/remove_builds_solution.groovy) | | ### CI/CD Self Assessment
What is Continuous Integration?
A development practice where developers integrate code into a shared repository frequently. It can range from a couple of changes every day or a week to a couple of changes in one hour in larger scales. Each piece of code (change/patch) is verified to make sure that the change is safe to merge. Today, it's a common practice to test the change using an automated build that makes sure the code can be integrated. It can be one build which runs several tests in different levels (unit, functional, etc.) or several separate builds that all or some has to pass in order for the change to be merged into the repository.
What is Continuous Deployment?
A development strategy used by developers to release software automatically into production where any code commit must pass through an automated testing phase. Only when this is successful is the release considered production worthy. This eliminates any human interaction and should be implemented only after production-ready pipelines have been set with real-time monitoring and reporting of deployed assets. If any issues are detected in production it should be easy to rollback to previous working state. For more info please read [here](https://www.atlassian.com/continuous-delivery/continuous-deployment)
Can you describe an example of a CI (and/or CD) process starting the moment a developer submitted a change/PR to a repository?
There are many answers for such a question, as CI processes vary, depending on the technologies used and the type of the project to where the change was submitted. Such processes can include one or more of the following stages: * Compile * Build * Install * Configure * Update * Test An example of one possible answer: A developer submitted a pull request to a project. The PR (pull request) triggered two jobs (or one combined job). One job for running lint test on the change and the second job for building a package which includes the submitted change, and running multiple api/scenario tests using that package. Once all tests passed and the change was approved by a maintainer/core, it's merged/pushed to the repository. If some of the tests failed, the change will not be allowed to merged/pushed to the repository. A complete different answer or CI process, can describe how a developer pushes code to a repository, a workflow then triggered to build a container image and push it to the registry. Once in the registry, the k8s cluster is applied with the new changes.
What is Continuous Delivery?
A development strategy used to frequently deliver code to QA and Ops for testing. This entails having a staging area that has production like features where changes can only be accepted for production after a manual review. Because of this human entanglement there is usually a time lag between release and review making it slower and error prone as compared to continuous deployment. For more info please read [here](https://www.atlassian.com/continuous-delivery/continuous-deployment)
What is difference between Continuous Delivery and Continuous Deployment?
Both encapsulate the same process of deploying the changes which were compiled and/or tested in the CI pipelines.
The difference between the two is that Continuous Delivery isn't fully automated process as opposed to Continuous Deployment where every change that is tested in the process is eventually deployed to production. In continuous delivery someone is either approving the deployment process or the deployment process is based on constraints and conditions (like time constraint of deploying every week/month/...)
What CI/CD best practices are you familiar with? Or what do you consider as CI/CD best practice?
* Commit and test often. * Testing/Staging environment should be a clone of production environment. * Clean up your environments (e.g. your CI/CD pipelines may create a lot of resources. They should also take care of cleaning up everything they create) * The CI/CD pipelines should provide the same results when executed locally or remotely * Treat CI/CD as another application in your organization. Not as a glue code. * On demand environments instead of pre-allocated resources for CI/CD purposes * Stages/Steps/Tasks of pipelines should be shared between applications or microservices (don't re-invent common tasks like "cloning a project")
You are given a pipeline and a pool with 3 workers: virtual machine, baremetal and a container. How will you decide on which one of them to run the pipeline?
The decision on which type of worker (virtual machine, bare-metal, or container) to use for running a pipeline would depend on several factors, including the nature of the pipeline, the requirements of the software being built, the available resources, and the specific goals and constraints of the development and deployment process. Here are some considerations that can help in making the decision: 1. Pipeline requirements 2. Resource availability 3. Scalability and flexibility 4. Deployment and isolation requirements 5. Security considerations 6. Development and operational workflows 7. Cost considerations Based on these considerations, the appropriate choice of worker (virtual machine, bare-metal, or container) for running the pipeline would be determined by weighing the pros and cons of each option and aligning with the specific requirements, resources, and goals of the development and deployment process. It may also be useful to consult with relevant stakeholders, such as developers, operations, and infrastructure teams, to gather input and make an informed decision.
Where do you store CI/CD pipelines? Why?
There are multiple approaches as to where to store the CI/CD pipeline definitions: 1. App Repository - store them in the same repository of the application they are building or testing (perhaps the most popular one) 2. Central Repository - store all organization's/project's CI/CD pipelines in one separate repository (perhaps the best approach when multiple teams test the same set of projects and they end up having many pipelines) 3. CI repo for every app repo - you separate CI related code from app code but you don't put everything in one place (perhaps the worst option due to the maintenance) 4. The platform where the CI/CD pipelines are being executed (e.g. Kubernetes Cluster in case of Tekton/OpenShift Pipelines).
How do you perform plan capacity for your CI/CD resources? (e.g. servers, storage, etc.)
Capacity planning for CI/CD resources involves estimating the resources required to support the CI/CD pipeline and ensuring that the infrastructure has enough capacity to meet the demands of the pipeline. Here are some steps to perform capacity planning for CI/CD resources: 1. Analyze workload 2. Monitor current usage 3. Identify resource bottlenecks 4. Forecast future demand 5. Plan for growth 6. Consider scalability and elasticity 7. Evaluate cost and budget 8. Continuously monitor and adjust By following these steps, you can effectively plan the capacity for your CI/CD resources, ensuring that your pipeline has sufficient resources to operate efficiently and meet the demands of your development process.
How would you structure/implement CD for an application which depends on several other applications?
Implementing Continuous Deployment (CD) for an application that depends on several other applications requires careful planning and coordination to ensure smooth and efficient deployment of changes across the entire ecosystem. Here are some general steps to structure/implement CD for an application with dependencies: 1. Define the deployment pipeline 2. Automate the deployment process 3. Version control and dependency management 4. Continuous integration and testing 5. Rolling deployments 6. Monitor and manage dependencies 7. Testing across the ecosystem 8. Rollback and recovery strategies 9. Security and compliance 10. Documentation and communication Implementing CD for an application with dependencies requires careful planning, coordination, and automation to ensure efficient and reliable deployments. By following best practices such as automation, version control, testing, monitoring, rollback strategies, and effective communication, you can ensure a smooth and successful CD process for your application ecosystem.
How do you measure your CI/CD quality? Are there any metrics or KPIs you are using for measuring the quality?
Measuring the quality of CI/CD processes is crucial to identify areas for improvement, ensure efficient and reliable software delivery, and achieve continuous improvement. Here are some commonly used metrics and KPIs (Key Performance Indicators) to measure CI/CD quality: 1. Build Success Rate: This metric measures the percentage of successful builds compared to the total number of builds. A high build success rate indicates that the majority of builds are successful and the CI/CD pipeline is stable. 2. Build and Deployment Time: This metric measures the time it takes to build and deploy changes from code commit to production. Faster build and deployment times indicate shorter feedback loops and faster time to market. 3. Deployment Frequency: This metric measures the frequency of deployments to production within a given time period. Higher deployment frequency indicates faster release cycles and more frequent updates to production. 4. Mean Time to Detect (MTTD): This metric measures the average time it takes to detect issues or defects in the CI/CD pipeline or production environment. Lower MTTD indicates faster detection and resolution of issues, leading to higher quality and more reliable deployments. 5. Mean Time to Recover (MTTR): This metric measures the average time it takes to recover from issues or incidents in the CI/CD pipeline or production environment. Lower MTTR indicates faster recovery and reduced downtime, leading to higher availability and reliability. 6. Feedback Loop Time: This metric measures the time it takes to receive feedback on code changes, including code reviews, test results, and other feedback mechanisms. Faster feedback loop times enable quicker iterations and faster improvements in the CI/CD process. 7. Customer Satisfaction: This metric measures the satisfaction of end-users or customers with the quality and reliability of the deployed software. Higher customer satisfaction indicates that the CI/CD process is delivering high-quality software that meets customer expectations. These are just some examples of metrics and KPIs that can be used to measure the quality of CI/CD processes. It's important to choose metrics that align with the goals and objectives of your organization and regularly track and analyze them to continuously improve the CI/CD process and ensure high-quality software delivery.
#### CI/CD - Jenkins
What is Jenkins? What have you used it for?
Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.
What are the advantages of Jenkins over its competitors? Can you compare it to one of the following systems? * Travis * Bamboo * Teamcity * CircleCI
Jenkins has several advantages over its competitors, including Travis, Bamboo, TeamCity, and CircleCI. Here are some of the key advantages: 1. Open-source and free 2. Customizable and flexible 3. Wide range of integrations and Plugins 4. Active and supportive community When comparing Jenkins to its competitors, there are some key differences in terms of features and capabilities. For example: - Travis: Travis is a cloud-based CI/CD platform that is known for its ease of use and fast setup. However, it has fewer customization options and integrations compared to Jenkins. - Bamboo: Bamboo is a CI/CD tool from Atlassian, the makers of JIRA and Confluence. It provides a range of features for building, testing, and deploying software, but it can be more expensive and complex to set up compared to Jenkins. - TeamCity: TeamCity is a CI/CD tool from JetBrains, the makers of IntelliJ IDEA. It provides a range of features for building, testing, and deploying software, but it can be more complex and resource-intensive compared to Jenkins. - CircleCI: CircleCI is a cloud-based CI/CD platform that is known for its fast build times and easy integration with GitHub. However, it can be more expensive compared to Jenkins, especially for larger projects.
What are the limitations or disadvantages of Jenkins?
This might be considered to be an opinionated answer: * Old fashioned dashboards with not many options to customize it * Containers readiness (this has improved with Jenkins X) * By itself, it doesn't have many features. On the other hand, there many plugins created by the community to expand its abilities * Managing Jenkins and its pipelines as a code can be one hell of a nightmare
Explain the following: - Job - Build - Plugin - Node or Worker - Executor
- Job is an automation definition = what and where to execute once the user clicks on "build" - Build is a running instance of a job. You can have one or more builds at any given point of time (unless limited by configuration) - A worker is the machine/instance on which the build is running. When a build starts, it "acquires" a worker out of a pool to run on it. - An executor is variable of the worker, defining how many builds can run on that worker in parallel. An executor value of 3 means, that 3 builds can run at any point on that executor (not necessarily of the same job. Any builds)
What plugins have you used in Jenkins?
Jenkins has a vast library of plugins, and the most commonly used plugins depend on the specific needs and requirements of each organization. However, here are some of the most popular and widely used plugins in Jenkins: Pipeline: This plugin allows users to create and manage complex, multi-stage pipelines using a simple and easy-to-use scripting language. It provides a powerful and flexible way to automate the entire software delivery process, from code commit to deployment. Git: This plugin provides integration with Git, one of the most popular version control systems used today. It allows users to pull code from Git repositories, trigger builds based on code changes, and push code changes back to Git. Docker: This plugin provides integration with Docker, a popular platform for building, shipping, and running distributed applications. It allows users to build and run Docker containers as part of their build process, enabling easy and repeatable deployment of applications. JUnit: This plugin provides integration with JUnit, a popular unit testing framework for Java applications. It allows users to run JUnit tests as part of their build process and generates reports and statistics on test results. Cobertura: This plugin provides code coverage reporting for Java applications. It allows users to measure the code coverage of their tests and generate reports on which parts of the code are covered by tests. Email Extension: This plugin provides advanced email notification capabilities for Jenkins. It allows users to customize the content and format of email notifications, including attachments, and send notifications to specific users or groups based on build results. Artifactory: This plugin provides integration with Artifactory, a popular artifact repository for storing and managing binaries and dependencies. It allows users to publish and retrieve artifacts from Artifactory as part of their build process. SonarQube: This plugin provides integration with SonarQube, a popular code quality analysis tool. It allows users to run code quality checks and generate reports on code quality metrics such as code complexity, code duplication, and code coverage.
Have you used Jenkins for CI or CD processes? Can you describe them?
Let's assume we have a web application built using Node.js, and we want to automate its build and deployment process using Jenkins. Here is how we can set up a simple CI/CD pipeline using Jenkins: 1. Install Jenkins: We can install Jenkins on a dedicated server or on a cloud platform such as AWS or Google Cloud. 2. Install necessary plugins: Depending on the specific requirements of the project, we may need to install plugins such as NodeJS, Git, Docker, and any other plugins required by the project. 3. Create a new job: In Jenkins, a job is a defined set of instructions for automating a particular task. We can create a new job and configure it to build our Node.js application. 4. Configure the job: We can configure the job to pull the latest code from the Git repository, install any necessary dependencies using Node.js, run unit tests, and build the application using a build script. 5. Set up a deployment environment: We can set up a separate environment for deploying the application, such as a staging or production environment. We can use Docker to create a container image of the application and deploy it to the environment. 6. Set up continuous deployment: We can configure the job to automatically deploy the application to the deployment environment if the build and tests pass. 7. Monitor and troubleshoot: We can monitor the pipeline for errors or failures and troubleshoot any issues that arise. This is just a simple example of a CI/CD pipeline using Jenkins, and the specific implementation details may vary depending on the requirements of the project.
What type of jobs are there? Which types have you used?
In Jenkins, there are various types of jobs, including: 1. Freestyle job: This is the most common type of job in Jenkins, which allows users to define custom build steps and configure various options, including build triggers, SCM polling, and post-build actions. 2. Pipeline job: Pipeline job is a newer feature in Jenkins that allows users to define a pipeline of jobs that can be executed in a specific order. The pipeline can be defined using a Jenkinsfile, which provides a script-like syntax for defining the pipeline stages, steps, and conditions. 3. Multi-configuration job: This type of job allows users to execute the same job with multiple configurations, such as different operating systems, browsers, or devices. Jenkins will execute the job for each configuration specified, providing a matrix of results. 4. Maven job: This type of job is specifically designed for building Java applications using the Maven build tool. Jenkins will execute the Maven build process, including compiling, testing, and packaging the application. 5. Parameterized job: This type of job allows users to define parameters that can be passed into the build process at runtime. Parameters can be used to customize the build process, such as specifying the version number or target environment.
How did you report build results to users? What ways are there to report the results?
You can report via: * Emails * Messaging apps * Dashboards Each has its own disadvantages and advantages. Emails for example, if sent too often, can be eventually disregarded or ignored.
You need to run unit tests every time a change submitted to a given project. Describe in details how your pipeline would look like and what will be executed in each stage
The pipelines will have multiple stages: * Clone the project * Install test dependencies (for example, if I need tox package to run the tests, I will install it in this stage) * Run unit tests * (Optional) report results (For example an email to the users) * Archive the relevant logs/files
How to secure Jenkins?
[Jenkins documentation](https://www.jenkins.io/doc/book/security/securing-jenkins/) provides some basic intro for securing your Jenkins server.
Describe how do you add new nodes (agents) to Jenkins
You can describe the UI way to add new nodes but better to explain how to do in a way that scales like a script or using dynamic source for nodes like one of the existing clouds.
How to acquire multiple nodes for one specific build?
To acquire multiple nodes for a specific build in Jenkins, you can use the "Parallel" feature in the pipeline script. The "Parallel" feature allows you to run multiple stages in parallel, and each stage can run on a different node. Here is an example pipeline script that demonstrates how to acquire multiple nodes for a specific build: ```tsx pipeline { agent any stages { stage('Build') { parallel { stage('Node 1') { agent { label 'node1' } steps { // Run build commands on Node 1 } } stage('Node 2') { agent { label 'node2' } steps { // Run build commands on Node 2 } } stage('Node 3') { agent { label 'node3' } steps { // Run build commands on Node 3 } } } } stage('Deploy') { agent any steps { // Deploy the built artifacts } } } } ``` In this example, the "Build" stage has three parallel stages, each running on a different node labeled as "node1", "node2", and "node3". The "Deploy" stage runs after the build is complete and runs on any available node. To use this pipeline script, you will need to have the three nodes (node1, node2, and node3) configured in Jenkins. You will also need to ensure that the necessary build commands and dependencies are installed on each node.
Whenever a build fails, you would like to notify the team owning the job regarding the failure and provide failure reason. How would you do that?
In Jenkins, you can use the "Email Notification" plugin to notify a team when a build fails. Here are the steps to set up email notifications for failed builds: 1. Install the "Email Notification" plugin if it's not already installed in Jenkins. 2. Go to the Jenkins job configuration page and click on "Configure". 3. Scroll down to the "Post-build Actions" section and click on "Add post-build action". 4. Select "Editable Email Notification" from the list of options. 5. Fill out the required fields, such as the recipient email addresses, subject line, and email content. You can use Jenkins environment variables, such as ${BUILD_URL} and ${BUILD_LOG}, to include build-specific information in the email content. 6. In the "Advanced Settings" section, select the "Send to recipients" option and choose "Only on failure" from the dropdown menu. 7. Click "Save" to save the job configuration. With this setup, Jenkins will send an email notification to the specified recipients whenever a build fails, providing them with the failure reason and any other relevant information.
There are four teams in your organization. How to prioritize the builds of each team? So the jobs of team x will always run before team y for example
In Jenkins, you can prioritize the builds of each team by using the "Priority Sorter" plugin. Here are the steps to set up build prioritization: 1. Install the "Priority Sorter" plugin if it's not already installed in Jenkins. 2. Go to the Jenkins system configuration page and click on "Configure Global Security". Scroll down to the "Access Control" section and click on "Per-project basis". 3. In the "Project default actions" section, select "Configure build triggers and execution" from the dropdown menu. Click on "Add user or group" and add the groups that represent each team in your organization. 4. Go to each Jenkins job configuration page and click on "Configure". Scroll down to the "Build Environment" section and click on "Add build step". Select "Set build priority with Priority Sorter" from the list of options. 5. Set the priority of the job based on the team that owns it. For example, if Team X owns the job, set the priority to a higher value than the jobs owned by Team Y. Click "Save" to save the job configuration. With this setup, Jenkins will prioritize the builds of each team based on the priority value set in the job configuration. Jobs owned by Team X will have a higher priority than jobs owned by Team Y, ensuring that they are executed first.
If you are managing a dozen of jobs, you can probably use the Jenkins UI. But how do you manage the creation and deletion of hundreds of jobs every week/month?
Managing the creation and deletion of hundreds of jobs every week/month in Jenkins can be a daunting task if done manually through the UI. Here are some approaches to manage large numbers of jobs efficiently: 1. Use job templates 2. Use Job DSL 3. Use Jenkins REST API 4. Use a configuration management tool 5. Use a Jenkins job management tool
What are some of Jenkins limitations?
* Testing cross-dependencies (changes from multiple projects together) * Starting builds from any stage (although Cloudbees implemented something called checkpoints)
What is the different between a scripted pipeline to declarative pipeline? Which type are you using?
Jenkins supports two types of pipelines: Scripted pipelines and Declarative pipelines. Scripted pipelines use Groovy syntax and provide a high degree of flexibility and control over the build process. Scripted pipelines allow developers to write custom code to handle complex scenarios, but can be complex and hard to maintain. Declarative pipelines are a newer feature and provide a simpler way to define pipelines using YAML syntax. Declarative pipelines provide a more structured and opinionated way to define builds, making it easier to get started with pipelines and reducing the risk of errors. Some key differences between the two types of pipelines are: 1. Syntax: Scripted pipelines use Groovy syntax while declarative pipelines use YAML syntax. 2. Structure: Declarative pipelines have a more structured format and define specific stages, while scripted pipelines provide more flexibility in defining build stages and steps. 3. Error handling: Declarative pipelines provide a more comprehensive error handling system with built-in conditions and actions, while scripted pipelines require more manual error handling. 4. Ease of use: Declarative pipelines are easier to use for beginners and provide a simpler syntax, while scripted pipelines require more expertise in Groovy and can be more complex. 5. Maintenance: Declarative pipelines are easier to maintain and can be modified with less effort compared to scripted pipelines, which can be more difficult to modify and extend over time. I am familiar with both types of pipelines, but generally prefer declarative pipelines for their ease of use and simplicity.
How would you implement an option of a starting a build from a certain stage and not from the beginning?
To implement an option of starting a build from a certain stage and not from the beginning in a Jenkins pipeline, we can use the `when` directive along with a custom parameter to determine the starting stage. Here are the steps to implement this: 1. Add a custom parameter to the pipeline. This parameter can be a simple string or a more complex data type like a map. ```tsx parameters { string(name: 'START_STAGE', defaultValue: '', description: 'The name of the stage to start the build from') } ``` 2. Use the `when` directive to conditionally execute stages based on the value of the `START_STAGE` parameter. ```tsx stage('Build') { when { expression { params.START_STAGE == '' || currentStage.name == params.START_STAGE } } // Build steps go here } stage('Test') { when { expression { params.START_STAGE == '' || currentStage.name == params.START_STAGE || previousStage.result == 'SUCCESS' } } // Test steps go here } stage('Deploy') { when { expression { params.START_STAGE == '' || currentStage.name == params.START_STAGE || previousStage.result == 'SUCCESS' } } // Deploy steps go here } ``` In this example, we use the `when` directive to execute each stage only if the `START_STAGE` parameter is empty or matches the current stage's name. Additionally, for the `Test` and `Deploy` stages, we also check if the previous stage executed successfully before running. 3. Trigger the pipeline and pass the `START_STAGE` parameter as needed. ```tsx pipeline { agent any parameters { string(name: 'START_STAGE', defaultValue: '', description: 'The name of the stage to start the build from') } stages { stage('Build') { // Build steps go here } stage('Test') { // Test steps go here } stage('Deploy') { // Deploy steps go here } } } ``` When triggering the pipeline, you can pass the `START_STAGE` parameter to start the build from a specific stage. For example, if you want to start the build from the `Test` stage, you can trigger the pipeline with the `START_STAGE` parameter set to `'Test'`: ```tsx pipeline?START_STAGE=Test ``` This will cause the pipeline to skip the `Build` stage and start directly from the `Test` stage.
Do you have experience with developing a Jenkins plugin? Can you describe this experience?
Developing a Jenkins plugin requires knowledge of Java and familiarity with Jenkins API. The process typically involves setting up a development environment, creating a new plugin project, defining the plugin's extension points, and implementing the desired functionality using Java code. Once the plugin is developed, it can be packaged and deployed to Jenkins. The Jenkins plugin ecosystem is extensive, and there are many resources available to assist with plugin development, including documentation, forums, and online communities. Additionally, Jenkins provides tools such as Jenkins Plugin POM Generator and Jenkins Plugin Manager to help with plugin development and management.
Have you written Jenkins scripts? If yes, what for and how they work?
#### CI/CD - GitHub Actions
What is a Workflow in GitHub Actions?
A YAML file that defines the automation actions and instructions to execute upon a specific event.
The file is placed in the repository itself. A Workflow can be anything - running tests, compiling code, building packages, ...
What is a Runner in GitHub Actions?
A workflow has to be executed somewhere. The environment where the workflow is executed is called Runner.
A Runner can be an on-premise host or GitHub hoste
What is a Job in GitHub Actions?
A job is a series of steps which are executed on the same runner/environment.
A workflow must include at least one job.
What is an Action in GitHub Actions?
An action is the smallest unit in a workflow. It includes the commands to execute as part of the job.
In GitHub Actions workflow, what the 'on' attribute/directive is used for?
Specify upon which events the workflow will be triggered.
For example, you might configure the workflow to trigger every time a changed is pushed to the repository.
True or False? In Github Actions, jobs are executed in parallel by default
True
How to create dependencies between jobs so one job runs after another?
Using the "needs" attribute/directive. ``` jobs: job1: job2: needs: job1 ``` In the above example, job1 must complete successfully before job2 runs
How to add a Workflow to a repository?
CLI: 1. Create the directory `.github/workflows` in the repository 2. Add a YAML file UI: 1. In the repository page, click on "Actions" 2. Choose workflow and click on "Set up this workflow"
#### Zuul
In Zuul, What are the check pipelines?
`check` pipeline are triggered when a patch is uploaded to a code review system (e.g. Gerrit).
In Zuul, What are the gate pipelines?
`gate` pipeline are triggered when a code reviewer approves the change in a code review system (e.g. Gerrit)
True or False? gate pipelines run after the check pipelines
True. `check` pipeline run when the change is uploaded, while the `gate` pipelines run when the change is approved by a reviewer
================================================ FILE: topics/cicd/ci_for_open_source_project.md ================================================ ## CI for Open Source Project 1. Choose an open source project from Github and fork it 2. Create a CI pipeline/workflow for the project you forked 3. The CI pipeline/workflow will include anything that is relevant to the project you forked. For example: * If it's a Python project, you will run PEP8 * If the project has unit tests directory, you will run these unit tests as part of the CI 4. In a separate file, describe what is running as part of the CI and why you chose to include it. You can also describe any thoughts, dilemmas, challenge you had ### Bonus Containerize the app of the project you forked using any container engine you would like (e.g. Docker, Podman).
Once you successfully ran the application in a container, submit the Dockerfile to the original project (but be prepared that the maintainer might not need/want that). ### Suggestions for Projects The following is a list of projects without CI (at least at the moment): Note: I wrote a script to find these (except the first project on the list, of course) based on some parameters in case you wonder why these projects specifically are listed. * [This one](https://github.com/bregman-arie/devops-exercises) - We don't have CI! help! :) * [image retrieval platform](https://github.com/skx6/image_retrieval_platform) * [FollowSpot](https://github.com/jenbrissman/FollowSpot) * [Pyrin](https://github.com/mononobi/pyrin) * [food-detection-yolov5](https://github.com/lannguyen0910/food-detection-yolov5) * [Lifely](https://github.com/sagnik1511/Lifely) ================================================ FILE: topics/cicd/deploy_to_kubernetes.md ================================================ ## Deploy to Kubernetes * Write a pipeline that will deploy an "hello world" web app to Kubernetes * The CI/CD system (where the pipeline resides) and the Kubernetes cluster should be on separate systems * The web app should be accessible remotely and only with HTTPS ================================================ FILE: topics/cicd/remove_builds.md ================================================ ### Jenkins - Remove Jobs #### Objective Learn how to write a Jenkins script that interacts with builds by removing builds older than X days. #### Instructions 1. Pick up (or create) a job which has builds older than X days 2. Write a script to remove only the builds that are older than X days #### Hints X can be anything. For example, remove builds that are older than 3 days. Just make sure that you don't simply remove all the builds (since that's different from the objective). ================================================ FILE: topics/cicd/remove_jobs.md ================================================ ### Jenkins - Remove Jobs #### Objective Learn how to write a Jenkins script to remove Jenkins jobs #### Instructions 1. Create three jobs called: test-job, test2-job and prod-job 2. Write a script to remove all the jobs that include the string "test" ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/Jenkinsfile ================================================ pipeline { agent any stages { stage('Checkout Source') { steps { git url:'https://github.com//.git', // credentialsId: 'creds_github', branch:'master' } } stage("Build image") { steps { script { myapp = docker.build("/helloworld:${env.BUILD_ID}") } } } stage("Push image") { steps { script { docker.withRegistry('https://registry.hub.docker.com', 'dockerhub') { myapp.push("latest") myapp.push("${env.BUILD_ID}") } } } } stage('Deploy App') { steps { script { sh 'ansible-playbook deploy.yml' } } } } } ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/README.md ================================================ ## Deploy to Kubernetes Note: this exercise can be solved in various ways. The solution described here is just one possible way. 1. Install Jenkins on one system (follow up the standard Jenkins installation procedure) 2. Deploy Kubernetes on a remote host (minikube can be an easy way to achieve it) 3. Create a simple web app or [page](html) 4. Create Kubernetes [resources](helloworld.yml) - Deployment, Service and Ingress (for HTTPS access) 5. Create an [Ansible inventory](inventory) and insert the address of the Kubernetes cluster 6. Write [Ansible playbook](deploy.yml) to deploy the Kubernetes resources and also generate 7. Create a [pipeline](Jenkinsfile) 8. Run the pipeline :) 9. Try to access the web app remotely ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/deploy.yml ================================================ - name: Apply Kubernetes YAMLs hosts: kubernetes tasks: - name: Ensure SSL related directories exist file: path: "{{ item }}" state: directory loop: - "/etc/ssl/crt" - "/etc/ssl/csr" - "/etc/ssl/private" - name: Generate an OpenSSL private key. openssl_privatekey: path: /etc/ssl/private/privkey.pem - name: generate openssl certificate signing requests openssl_csr: path: /etc/ssl/csr/hello-world.app.csr privatekey_path: /etc/ssl/private/privkey.pem common_name: hello-world.app - name: Generate a Self Signed OpenSSL certificate openssl_certificate: path: /etc/ssl/crt/hello-world.app.crt privatekey_path: /etc/ssl/private/privkey.pem csr_path: /etc/ssl/csr/hello-world.app.csr provider: selfsigned - name: Create k8s secret command: "kubectl create secret tls tls-secret --cert=/etc/ssl/crt/hello-world.app.crt --key=/etc/ssl/private/privkey.pem" register: result failed_when: - result.rc == 2 - name: Deploy web app k8s: state: present definition: "{{ lookup('file', './helloworld.yml') }}" kubeconfig: '/home/abregman/.kube/config' namespace: 'default' wait: true ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/helloworld.yml ================================================ --- apiVersion: apps/v1 kind: Deployment metadata: name: hello-blue-whale spec: replicas: 3 selector: matchLabels: app: hello-world-app version: blue template: metadata: name: hello-blue-whale-pod labels: app: hello-world-app version: blue spec: containers: - name: hello-whale-container image: abregman2/helloworld:latest imagePullPolicy: Always ports: - containerPort: 80 - containerPort: 443 --- apiVersion: v1 kind: Service metadata: name: hello-world labels: app: hello-world-app spec: ports: - port: 80 targetPort: 80 protocol: TCP name: http selector: app: hello-world-app --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-ingress annotations: cert-manager.io/cluster-issuer: selfsigned-issuer nginx.ingress.kubernetes.io/rewrite-target: / kubernetes.io/ingress.class: nginx spec: tls: - hosts: - hello-world.app secretName: shhh rules: - host: hello-world.app http: paths: - path: / pathType: Prefix backend: service: name: hello-world port: number: 80 ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/html/css/normalize.css ================================================ /*! normalize.css v3.0.2 | MIT License | git.io/normalize */ /** * 1. Set default font family to sans-serif. * 2. Prevent iOS text size adjust after orientation change, without disabling * user zoom. */ html { font-family: sans-serif; /* 1 */ -ms-text-size-adjust: 100%; /* 2 */ -webkit-text-size-adjust: 100%; /* 2 */ } /** * Remove default margin. */ body { margin: 0; } /* HTML5 display definitions ========================================================================== */ /** * Correct `block` display not defined for any HTML5 element in IE 8/9. * Correct `block` display not defined for `details` or `summary` in IE 10/11 * and Firefox. * Correct `block` display not defined for `main` in IE 11. */ article, aside, details, figcaption, figure, footer, header, hgroup, main, menu, nav, section, summary { display: block; } /** * 1. Correct `inline-block` display not defined in IE 8/9. * 2. Normalize vertical alignment of `progress` in Chrome, Firefox, and Opera. */ audio, canvas, progress, video { display: inline-block; /* 1 */ vertical-align: baseline; /* 2 */ } /** * Prevent modern browsers from displaying `audio` without controls. * Remove excess height in iOS 5 devices. */ audio:not([controls]) { display: none; height: 0; } /** * Address `[hidden]` styling not present in IE 8/9/10. * Hide the `template` element in IE 8/9/11, Safari, and Firefox < 22. */ [hidden], template { display: none; } /* Links ========================================================================== */ /** * Remove the gray background color from active links in IE 10. */ a { background-color: transparent; } /** * Improve readability when focused and also mouse hovered in all browsers. */ a:active, a:hover { outline: 0; } /* Text-level semantics ========================================================================== */ /** * Address styling not present in IE 8/9/10/11, Safari, and Chrome. */ abbr[title] { border-bottom: 1px dotted; } /** * Address style set to `bolder` in Firefox 4+, Safari, and Chrome. */ b, strong { font-weight: bold; } /** * Address styling not present in Safari and Chrome. */ dfn { font-style: italic; } /** * Address variable `h1` font-size and margin within `section` and `article` * contexts in Firefox 4+, Safari, and Chrome. */ h1 { font-size: 2em; margin: 0.67em 0; } /** * Address styling not present in IE 8/9. */ mark { background: #ff0; color: #000; } /** * Address inconsistent and variable font size in all browsers. */ small { font-size: 80%; } /** * Prevent `sub` and `sup` affecting `line-height` in all browsers. */ sub, sup { font-size: 75%; line-height: 0; position: relative; vertical-align: baseline; } sup { top: -0.5em; } sub { bottom: -0.25em; } /* Embedded content ========================================================================== */ /** * Remove border when inside `a` element in IE 8/9/10. */ img { border: 0; } /** * Correct overflow not hidden in IE 9/10/11. */ svg:not(:root) { overflow: hidden; } /* Grouping content ========================================================================== */ /** * Address margin not present in IE 8/9 and Safari. */ figure { margin: 1em 40px; } /** * Address differences between Firefox and other browsers. */ hr { -moz-box-sizing: content-box; box-sizing: content-box; height: 0; } /** * Contain overflow in all browsers. */ pre { overflow: auto; } /** * Address odd `em`-unit font size rendering in all browsers. */ code, kbd, pre, samp { font-family: monospace, monospace; font-size: 1em; } /* Forms ========================================================================== */ /** * Known limitation: by default, Chrome and Safari on OS X allow very limited * styling of `select`, unless a `border` property is set. */ /** * 1. Correct color not being inherited. * Known issue: affects color of disabled elements. * 2. Correct font properties not being inherited. * 3. Address margins set differently in Firefox 4+, Safari, and Chrome. */ button, input, optgroup, select, textarea { color: inherit; /* 1 */ font: inherit; /* 2 */ margin: 0; /* 3 */ } /** * Address `overflow` set to `hidden` in IE 8/9/10/11. */ button { overflow: visible; } /** * Address inconsistent `text-transform` inheritance for `button` and `select`. * All other form control elements do not inherit `text-transform` values. * Correct `button` style inheritance in Firefox, IE 8/9/10/11, and Opera. * Correct `select` style inheritance in Firefox. */ button, select { text-transform: none; } /** * 1. Avoid the WebKit bug in Android 4.0.* where (2) destroys native `audio` * and `video` controls. * 2. Correct inability to style clickable `input` types in iOS. * 3. Improve usability and consistency of cursor style between image-type * `input` and others. */ button, html input[type="button"], /* 1 */ input[type="reset"], input[type="submit"] { -webkit-appearance: button; /* 2 */ cursor: pointer; /* 3 */ } /** * Re-set default cursor for disabled elements. */ button[disabled], html input[disabled] { cursor: default; } /** * Remove inner padding and border in Firefox 4+. */ button::-moz-focus-inner, input::-moz-focus-inner { border: 0; padding: 0; } /** * Address Firefox 4+ setting `line-height` on `input` using `!important` in * the UA stylesheet. */ input { line-height: normal; } /** * It's recommended that you don't attempt to style these elements. * Firefox's implementation doesn't respect box-sizing, padding, or width. * * 1. Address box sizing set to `content-box` in IE 8/9/10. * 2. Remove excess padding in IE 8/9/10. */ input[type="checkbox"], input[type="radio"] { box-sizing: border-box; /* 1 */ padding: 0; /* 2 */ } /** * Fix the cursor style for Chrome's increment/decrement buttons. For certain * `font-size` values of the `input`, it causes the cursor style of the * decrement button to change from `default` to `text`. */ input[type="number"]::-webkit-inner-spin-button, input[type="number"]::-webkit-outer-spin-button { height: auto; } /** * 1. Address `appearance` set to `searchfield` in Safari and Chrome. * 2. Address `box-sizing` set to `border-box` in Safari and Chrome * (include `-moz` to future-proof). */ input[type="search"] { -webkit-appearance: textfield; /* 1 */ -moz-box-sizing: content-box; -webkit-box-sizing: content-box; /* 2 */ box-sizing: content-box; } /** * Remove inner padding and search cancel button in Safari and Chrome on OS X. * Safari (but not Chrome) clips the cancel button when the search input has * padding (and `textfield` appearance). */ input[type="search"]::-webkit-search-cancel-button, input[type="search"]::-webkit-search-decoration { -webkit-appearance: none; } /** * Define consistent border, margin, and padding. */ fieldset { border: 1px solid #c0c0c0; margin: 0 2px; padding: 0.35em 0.625em 0.75em; } /** * 1. Correct `color` not being inherited in IE 8/9/10/11. * 2. Remove padding so people aren't caught out if they zero out fieldsets. */ legend { border: 0; /* 1 */ padding: 0; /* 2 */ } /** * Remove default vertical scrollbar in IE 8/9/10/11. */ textarea { overflow: auto; } /** * Don't inherit the `font-weight` (applied by a rule above). * NOTE: the default cannot safely be changed in Chrome and Safari on OS X. */ optgroup { font-weight: bold; } /* Tables ========================================================================== */ /** * Remove most spacing between table cells. */ table { border-collapse: collapse; border-spacing: 0; } td, th { padding: 0; } ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/html/css/skeleton.css ================================================ /* * Skeleton V2.0.4 * Copyright 2014, Dave Gamache * www.getskeleton.com * Free to use under the MIT license. * http://www.opensource.org/licenses/mit-license.php * 12/29/2014 */ /* Table of contents –––––––––––––––––––––––––––––––––––––––––––––––––– - Grid - Base Styles - Typography - Links - Buttons - Forms - Lists - Code - Tables - Spacing - Utilities - Clearing - Media Queries */ /* Grid –––––––––––––––––––––––––––––––––––––––––––––––––– */ .container { position: relative; width: 100%; max-width: 960px; margin: 0 auto; padding: 0 20px; box-sizing: border-box; } .column, .columns { width: 100%; float: left; box-sizing: border-box; } /* For devices larger than 400px */ @media (min-width: 400px) { .container { width: 85%; padding: 0; } } /* For devices larger than 550px */ @media (min-width: 550px) { .container { width: 80%; } .column, .columns { margin-left: 4%; } .column:first-child, .columns:first-child { margin-left: 0; } .one.column, .one.columns { width: 4.66666666667%; } .two.columns { width: 13.3333333333%; } .three.columns { width: 22%; } .four.columns { width: 30.6666666667%; } .five.columns { width: 39.3333333333%; } .six.columns { width: 48%; } .seven.columns { width: 56.6666666667%; } .eight.columns { width: 65.3333333333%; } .nine.columns { width: 74.0%; } .ten.columns { width: 82.6666666667%; } .eleven.columns { width: 91.3333333333%; } .twelve.columns { width: 100%; margin-left: 0; } .one-third.column { width: 30.6666666667%; } .two-thirds.column { width: 65.3333333333%; } .one-half.column { width: 48%; } /* Offsets */ .offset-by-one.column, .offset-by-one.columns { margin-left: 8.66666666667%; } .offset-by-two.column, .offset-by-two.columns { margin-left: 17.3333333333%; } .offset-by-three.column, .offset-by-three.columns { margin-left: 26%; } .offset-by-four.column, .offset-by-four.columns { margin-left: 34.6666666667%; } .offset-by-five.column, .offset-by-five.columns { margin-left: 43.3333333333%; } .offset-by-six.column, .offset-by-six.columns { margin-left: 52%; } .offset-by-seven.column, .offset-by-seven.columns { margin-left: 60.6666666667%; } .offset-by-eight.column, .offset-by-eight.columns { margin-left: 69.3333333333%; } .offset-by-nine.column, .offset-by-nine.columns { margin-left: 78.0%; } .offset-by-ten.column, .offset-by-ten.columns { margin-left: 86.6666666667%; } .offset-by-eleven.column, .offset-by-eleven.columns { margin-left: 95.3333333333%; } .offset-by-one-third.column, .offset-by-one-third.columns { margin-left: 34.6666666667%; } .offset-by-two-thirds.column, .offset-by-two-thirds.columns { margin-left: 69.3333333333%; } .offset-by-one-half.column, .offset-by-one-half.columns { margin-left: 52%; } } /* Base Styles –––––––––––––––––––––––––––––––––––––––––––––––––– */ /* NOTE html is set to 62.5% so that all the REM measurements throughout Skeleton are based on 10px sizing. So basically 1.5rem = 15px :) */ html { font-size: 62.5%; } body { font-size: 1.5em; /* currently ems cause chrome bug misinterpreting rems on body element */ line-height: 1.6; font-weight: 400; font-family: "Raleway", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif; color: #222; } /* Typography –––––––––––––––––––––––––––––––––––––––––––––––––– */ h1, h2, h3, h4, h5, h6 { margin-top: 0; margin-bottom: 2rem; font-weight: 300; } h1 { font-size: 4.0rem; line-height: 1.2; letter-spacing: -.1rem;} h2 { font-size: 3.6rem; line-height: 1.25; letter-spacing: -.1rem; } h3 { font-size: 3.0rem; line-height: 1.3; letter-spacing: -.1rem; } h4 { font-size: 2.4rem; line-height: 1.35; letter-spacing: -.08rem; } h5 { font-size: 1.8rem; line-height: 1.5; letter-spacing: -.05rem; } h6 { font-size: 1.5rem; line-height: 1.6; letter-spacing: 0; } /* Larger than phablet */ @media (min-width: 550px) { h1 { font-size: 5.0rem; } h2 { font-size: 4.2rem; } h3 { font-size: 3.6rem; } h4 { font-size: 3.0rem; } h5 { font-size: 2.4rem; } h6 { font-size: 1.5rem; } } p { margin-top: 0; } /* Links –––––––––––––––––––––––––––––––––––––––––––––––––– */ a { color: #1EAEDB; } a:hover { color: #0FA0CE; } /* Buttons –––––––––––––––––––––––––––––––––––––––––––––––––– */ .button, button, input[type="submit"], input[type="reset"], input[type="button"] { display: inline-block; height: 38px; padding: 0 30px; color: #555; text-align: center; font-size: 11px; font-weight: 600; line-height: 38px; letter-spacing: .1rem; text-transform: uppercase; text-decoration: none; white-space: nowrap; background-color: transparent; border-radius: 4px; border: 1px solid #bbb; cursor: pointer; box-sizing: border-box; } .button:hover, button:hover, input[type="submit"]:hover, input[type="reset"]:hover, input[type="button"]:hover, .button:focus, button:focus, input[type="submit"]:focus, input[type="reset"]:focus, input[type="button"]:focus { color: #333; border-color: #888; outline: 0; } .button.button-primary, button.button-primary, input[type="submit"].button-primary, input[type="reset"].button-primary, input[type="button"].button-primary { color: #FFF; background-color: #33C3F0; border-color: #33C3F0; } .button.button-primary:hover, button.button-primary:hover, input[type="submit"].button-primary:hover, input[type="reset"].button-primary:hover, input[type="button"].button-primary:hover, .button.button-primary:focus, button.button-primary:focus, input[type="submit"].button-primary:focus, input[type="reset"].button-primary:focus, input[type="button"].button-primary:focus { color: #FFF; background-color: #1EAEDB; border-color: #1EAEDB; } /* Forms –––––––––––––––––––––––––––––––––––––––––––––––––– */ input[type="email"], input[type="number"], input[type="search"], input[type="text"], input[type="tel"], input[type="url"], input[type="password"], textarea, select { height: 38px; padding: 6px 10px; /* The 6px vertically centers text on FF, ignored by Webkit */ background-color: #fff; border: 1px solid #D1D1D1; border-radius: 4px; box-shadow: none; box-sizing: border-box; } /* Removes awkward default styles on some inputs for iOS */ input[type="email"], input[type="number"], input[type="search"], input[type="text"], input[type="tel"], input[type="url"], input[type="password"], textarea { -webkit-appearance: none; -moz-appearance: none; appearance: none; } textarea { min-height: 65px; padding-top: 6px; padding-bottom: 6px; } input[type="email"]:focus, input[type="number"]:focus, input[type="search"]:focus, input[type="text"]:focus, input[type="tel"]:focus, input[type="url"]:focus, input[type="password"]:focus, textarea:focus, select:focus { border: 1px solid #33C3F0; outline: 0; } label, legend { display: block; margin-bottom: .5rem; font-weight: 600; } fieldset { padding: 0; border-width: 0; } input[type="checkbox"], input[type="radio"] { display: inline; } label > .label-body { display: inline-block; margin-left: .5rem; font-weight: normal; } /* Lists –––––––––––––––––––––––––––––––––––––––––––––––––– */ ul { list-style: circle inside; } ol { list-style: decimal inside; } ol, ul { padding-left: 0; margin-top: 0; } ul ul, ul ol, ol ol, ol ul { margin: 1.5rem 0 1.5rem 3rem; font-size: 90%; } li { margin-bottom: 1rem; } /* Code –––––––––––––––––––––––––––––––––––––––––––––––––– */ code { padding: .2rem .5rem; margin: 0 .2rem; font-size: 90%; white-space: nowrap; background: #F1F1F1; border: 1px solid #E1E1E1; border-radius: 4px; } pre > code { display: block; padding: 1rem 1.5rem; white-space: pre; } /* Tables –––––––––––––––––––––––––––––––––––––––––––––––––– */ th, td { padding: 12px 15px; text-align: left; border-bottom: 1px solid #E1E1E1; } th:first-child, td:first-child { padding-left: 0; } th:last-child, td:last-child { padding-right: 0; } /* Spacing –––––––––––––––––––––––––––––––––––––––––––––––––– */ button, .button { margin-bottom: 1rem; } input, textarea, select, fieldset { margin-bottom: 1.5rem; } pre, blockquote, dl, figure, table, p, ul, ol, form { margin-bottom: 2.5rem; } /* Utilities –––––––––––––––––––––––––––––––––––––––––––––––––– */ .u-full-width { width: 100%; box-sizing: border-box; } .u-max-full-width { max-width: 100%; box-sizing: border-box; } .u-pull-right { float: right; } .u-pull-left { float: left; } /* Misc –––––––––––––––––––––––––––––––––––––––––––––––––– */ hr { margin-top: 3rem; margin-bottom: 3.5rem; border-width: 0; border-top: 1px solid #E1E1E1; } /* Clearing –––––––––––––––––––––––––––––––––––––––––––––––––– */ /* Self Clearing Goodness */ .container:after, .row:after, .u-cf { content: ""; display: table; clear: both; } /* Media Queries –––––––––––––––––––––––––––––––––––––––––––––––––– */ /* Note: The best way to structure the use of media queries is to create the queries near the relevant code. For example, if you wanted to change the styles for buttons on small devices, paste the mobile query code up in the buttons section and style it there. */ /* Larger than mobile */ @media (min-width: 400px) {} /* Larger than phablet (also point when grid becomes active) */ @media (min-width: 550px) {} /* Larger than tablet */ @media (min-width: 750px) {} /* Larger than desktop */ @media (min-width: 1000px) {} /* Larger than Desktop HD */ @media (min-width: 1200px) {} ================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/html/index.html ================================================ Hello World :)

Hello World :)

================================================ FILE: topics/cicd/solutions/deploy_to_kubernetes/inventory ================================================ [kubernetes] x.x.x.x ================================================ FILE: topics/cicd/solutions/remove_builds_solution.groovy ================================================ def removeOldBuilds(buildDirectory, days = 14) { def wp = new File("${buildDirectory}") def currentTime = new Date() def backTime = currentTime - days wp.list().each { fileName -> folder = new File("${buildDirectory}/${fileName}") if (folder.isDirectory()) { def timeStamp = new Date(folder.lastModified()) if (timeStamp.before(backTime)) { folder.delete() } } } } ================================================ FILE: topics/cicd/solutions/remove_jobs_solution.groovy ================================================ def jobs = Jenkins.instance.items.findAll { job -> job.name =~ /"test"/ } jobs.each { job -> println job.name //job.delete() } ================================================ FILE: topics/circleci/README.md ================================================ # Circle CI ## Circle CI Questions ### Circle CI 101
What is Circle CI?
[Circle CI](https://circleci.com): "CircleCI is a continuous integration and continuous delivery platform that can be used to implement DevOps practices."
What are some benefits of Circle CI?
[Circle CI Docs](https://circleci.com/docs/about-circleci): "SSH into any job to debug your build issues. Set up parallelism in your .circleci/config.yml file to run jobs faster. Configure caching with two simple keys to reuse data from previous jobs in your workflow. Configure self-hosted runners for unique platform support. Access Arm resources for the machine executor. Use orbs, reusable packages of configuration, to integrate with third parties. Use pre-built Docker images in a variety of languages. Use the API to retrieve information about jobs and workflows. Use the CLI to access advanced tools locally. Get flaky test detection with test insights."
Explain the following: * Pipeline * Workflow * Jobs * Steps
* Pipeline: the entire CI/CD configuration (.circleci/config.yaml) * Workflow: primarily used when there is more than one job in the configuration to orchestrate the workflows * Jobs: One or more steps to execute as part of the CI/CD process * Steps: The actual commands to execute
What is an Orb?
[Circle CI Docs](https://circleci.com/developer/orbs): "Orbs are shareable packages of CircleCI configuration you can use to simplify your builds" They can come from the public registry or defined privately as part of an organization.
### Circle CI Hands-On 101
Where (in what location in the project) Circle CI pipelines are defined?
`.circleci/config.yml`
Explain the following configuration file ``` version: 2.1 jobs: say-hello: docker: - image: cimg/base:stable steps: - checkout - run: name: "Say hello" command: "echo Hello, World!" workflows: say-hello-workflow: jobs: - say-hello ```
This configuration file will set up one job that will checkout the code of the project will run the command `echo Hello, World!`. It will run in a container using the image `cimg/base:stable`.
================================================ FILE: topics/cloud/README.md ================================================ ## Cloud
What is Cloud Computing? What is a Cloud Provider?
Cloud computing refers to the delivery of on-demand computing services over the internet on a pay-as-you-go basis. In simple words, Cloud computing is a service that lets you use any computing service such as a server, storage, networking, databases, and intelligence, right through your browser without owning anything. You can do anything you can think of unless it doesn’t require you to stay close to your hardware. Cloud service providers are companies that establish public clouds, manage private clouds, or offer on-demand cloud computing components (also known as cloud computing services) like Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service(SaaS). Cloud services can reduce business process costs when compared to on-premise IT.
What are the advantages of cloud computing? Mention at least 3 advantages
* Pay as you go: you are paying only for what you are using. No upfront payments and payment stops when resources are no longer used. * Scalable: resources are scaled down or up based on demand * High availability: resources and applications provide seamless experience, even when some services are down * Disaster recovery
True or False? Cloud computing is a consumption-based model (users only pay for for resources they use)
True
What types of Cloud Computing services are there?
IAAS - Infrastructure as a Service PAAS - Platform as a Service SAAS - Software as a Service
Explain each of the following and give an example: * IAAS * PAAS * SAAS
* IAAS - Users have control over complete Operating System and don't need to worry about the physical resources, which is managed by Cloud Service Provider. * PAAS - CLoud Service Provider takes care of Operating System, Middlewares and users only need to focus on our Data and Application. * SAAS - A cloud based method to provide software to users, software logics running on cloud, can be run on-premises or managed by Cloud Service Provider.
What types of clouds (or cloud deployments) are there?
* Public - Cloud services sharing computing resources among multiple customers * Private - Cloud services having computing resources limited to specific customer or organization, managed by third party or organizations itself * Hybrid - Combination of public and private clouds
What are the differences between Cloud Providers and On-Premise solution?
In cloud providers, someone else owns and manages the hardware, hire the relevant infrastructure teams and pays for real-estate (for both hardware and people). You can focus on your business. In On-Premise solution, it's quite the opposite. You need to take care of hardware, infrastructure teams and pay for everything which can be quite expensive. On the other hand it's tailored to your needs.
What is Serverless Computing?
The main idea behind serverless computing is that you don't need to manage the creation and configuration of server. All you need to focus on is splitting your app into multiple functions which will be triggered by some actions. It's important to note that: * Serverless Computing is still using servers. So saying there are no servers in serverless computing is completely wrong * Serverless Computing allows you to have a different paying model. You basically pay only when your functions are running and not when the VM or containers are running as in other payment models
Can we replace any type of computing on servers with serverless?
Is there a difference between managed service to SaaS or is it the same thing?
What is auto scaling?
AWS definition: "AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost" Read more about auto scaling [here](https://aws.amazon.com/autoscaling)
What is the difference between horizontal scaling and vertical scaling?
[AWS Docs](https://wa.aws.amazon.com/wellarchitected/2020-07-02T19-33-23/wat.concept.horizontal-scaling.en.html): A "horizontally scalable" system is one that can increase capacity by adding more computers to the system. This is in contrast to a "vertically scalable" system, which is constrained to running its processes on only one computer; in such systems the only way to increase performance is to add more resources into one computer in the form of faster (or more) CPUs, memory or storage. Horizontally scalable systems are oftentimes able to outperform vertically scalable systems by enabling parallel execution of workloads and distributing those across many different computers.
True or False? Auto Scaling is about adding resources (such as instances) and not about removing resource
False. Auto scaling adjusts capacity and this can mean removing some resources based on usage and performances.
#### Cloud - Security
How to secure instances in the cloud?
* Instance should have minimal permissions needed. You don't want an instance-level incident to become an account-level incident * Instances should be accessed through load balancers or bastion hosts. In other words, they should be off the internet (in a private subnet behind a NAT). * Using latest OS images with your instances (or at least apply latest patches)
================================================ FILE: topics/cloud_slack_bot.md ================================================ ## Cloud Slack Bot Create a slack bot to manage cloud instances. You can choose whatever cloud provider you want (e.g. Openstack, AWS, GCP, Azure) You should provide: * Instructions on how to use it * Source code of the slack bot * A running slack bot account or a deployment script so we can test it The bot should be able to support: * Creating new instances * Removing existing instances * Starting an instance * Stopping an instance * Displaying the status of an instance * List all available instances The bot should also be able to show help message. ================================================ FILE: topics/containers/README.md ================================================ # Containers - [Containers](#containers) - [Exercises](#exercises) - [Running Containers](#running-containers) - [Images](#images) - [Misc](#misc) - [Questions](#questions) - [Containers 101](#containers-101) - [Commands Commands](#commands-commands) - [Images](#images-1) - [Registry](#registry) - [Tags](#tags) - [Containerfile](#containerfile) - [Storage](#storage) - [Architecture](#architecture) - [Docker Architecture](#docker-architecture) - [Docker Compose](#docker-compose) - [Networking](#networking) - [Docker Networking](#docker-networking) - [Security](#security) - [Docker in Production](#docker-in-production) - [OCI](#oci) - [Scenarios](#scenarios) ## Exercises ### Running Containers |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| |Running Containers|Basics|[Exercise](running_containers.md)|[Solution](solutions/running_containers.md) |Containerized Web Server|Applications|[Exercise](containerized_web_server.md)|[Solution](solutions/containerized_web_server.md) |Containerized Database|Applications|[Exercise](containerized_db.md)|[Solution](solutions/containerized_db.md) |Containerized Database with Persistent Storage|Applications|[Exercise](containerized_db_persistent_storage.md)|[Solution](solutions/containerized_db_persistent_storage.md) ### Images |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| |Working with Images|Image|[Exercise](working_with_images.md)|[Solution](solutions/working_with_images.md) |Sharing Images (without a registry)|Images|[Exercise](sharing_images.md)|[Solution](solutions/sharing_images.md) |Creating images on the fly|Images|[Exercise](commit_image.md)|[Solution](solutions/commit_image.md) |My First Containerfile|Containerfile|[Exercise](write_containerfile_run_container.md)| ### Misc |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| |Run, Forest, Run!|Restart Policies|[Exercise](run_forest_run.md)|[Solution](solutions/run_forest_run.md) |Layer by Layer|Image Layers|[Exercise](image_layers.md)|[Solution](solutions/image_layers.md) |Containerize an application | Containerization |[Exercise](containerize_app.md)|[Solution](solutions/containerize_app.md) |Multi-Stage Builds|Multi-Stage Builds|[Exercise](multi_stage_builds.md)|[Solution](solutions/multi_stage_builds.md) ## Questions ### Containers 101
What is a Container?
This can be tricky to answer since there are many ways to create a containers: - Docker - systemd-nspawn - LXC If to focus on OCI (Open Container Initiative) based containers, it offers the following [definition](https://github.com/opencontainers/runtime-spec/blob/master/glossary.md#container): "An environment for executing processes with configurable isolation and resource limitations. For example, namespaces, resource limits, and mounts are all part of the container environment."
Why containers are needed? What is their goal?
OCI provides a good [explanation](https://github.com/opencontainers/runtime-spec/blob/master/principles.md#the-5-principles-of-standard-containers): "Define a unit of software delivery called a Standard Container. The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container."
What is a container image?
* An image of a container contains the application, its dependencies and the operating system where the application is executed.
* It's a collection of read-only layers. These layers are loosely coupled * Each layer is assembled out of one or more files
How are containers different from virtual machines (VMs)?
The primary difference between containers and VMs is that containers allow you to virtualize multiple workloads on a single operating system while in the case of VMs, the hardware is being virtualized to run multiple machines each with its own guest OS. You can also think about it as containers are for OS-level virtualization while VMs are for hardware virtualization. * Containers don't require an entire guest operating system as VMs. Containers share the system's kernel as opposed to VMs. They isolate themselves via the use of kernel's features such as namespaces and cgroups * It usually takes a few seconds to set up a container as opposed to VMs which can take minutes or at least more time than containers as there is an entire OS to boot and initialize as opposed to containers which has share of the underlying OS * Virtual machines considered to be more secured than containers * VMs portability considered to be limited when compared to containers
In which scenarios would you use containers and in which you would prefer to use VMs?
You should choose VMs when: * You need run an application which requires all the resources and functionalities of an OS * You need full isolation and security You should choose containers when: * You need a lightweight solution * Running multiple versions or instances of a single application
Describe the process of containerizing an application
1. Write a Containerfile/Dockerfile that includes your app (including the commands to run it) and its dependencies 2. Build the image using the Containerfile/Dockefile you wrote 3. You might want to push the image to a registry 4. Run the container using the image you've built
What are some of the advantages in using containers? you can compare to other options like VMs
* Reusable: container can be used by multiple different users for different usages - production vs. staging, development, testing, etc. * Lightweight: containers are fairly lightweight which means deployments can be done quickly since you don't need to install a full OS (as in VMs for example) * Isolation: Containers are isolated environments, usually changes made to the OS won't affect the containers and vice-versa
### Commands Commands Note: I've used `Podman` in the answers, but other containers engines can be used as well (e.g. Docker)
How to run a container?
`podman run ubuntu`
Why after running podman container run ubuntu the output of podman container ls is empty?
Because the container immediately exits after running the ubuntu image. This is completely normal and expected as containers designed to run a service or a app and exit when they are done running it. To see the container you can run `podman ps -a` If you want the container to keep running, you can run a command like `sleep 100` which will run for 100 seconds or you can attach to terminal of the container with a command similar: `podman container run -it ubuntu /bin/bash`
How to list all the containers on the local host?
`podman container ls`
How to attach your shell to a terminal of a running container?
`podman container exec -it [container id/name] bash` This can be done in advance while running the container: `podman container run -it [image:tag] /bin/bash`
True or False? You can remove a running container if it doesn't running anything
False. You have to stop the container before removing it.
How to stop and remove a container?
`podman container stop && podman container rm `
What happens when you run docker container run ubuntu?
1. Docker client posts the command to the API server running as part of the Docker daemon 2. Docker daemon checks if a local image exists 1. If it exists, it will use it 2. If doesn't exists, it will go to the remote registry (Docker Hub by default) and pull the image locally 3. containerd and runc are instructed (by the daemon) to create and start the container
How to run a container in the background?
With the -d flag. It will run in the background and will not attach it to the terminal. `docker container run -d httpd` or `podman container run -d httpd`
If you'll run sleep 100 inside a container, will you see it when listing all the processes of the host on which the container runs? Why?
True or False? If image httpd-service has an entry point for running the httpd service then, the following will run the container and eventually the httpd service podman run httpd-service ls
False. Running that command will override the entry point so the httpd service won't run and instead podman will run the `ls` command.
True or False? Running podman restart CONTAINER_NAME kills the main process inside the container and runs it again from scratch
False. `podman restart` creates an entirely new container with the same ID while reusing the filesystem and state of the original container.
You would like to run a web server inside a container but, be able to access it from the localhost. Demonstrate how to do that
``` podman run -d --name apache1 -p 8080:8080 registry.redhat.io/rhel8/httpd-24 curl 127.0.0.1:8080 ```
After running a container, it stopped. podman ps shows nothing. How can you show its details?
`podman ps -a` will shows also the details of a stopped container.
How to list all the image tags for a given container image?
`podman search --list-tags IMAGE_NAME`
### Images
Why container images are relatively small?
* Most of the images don't contain Kernel. They share and access the one used by the host on which they are running * Containers intended to run specific application in most cases. This means they hold only what the application needs in order to run
You are interested in running a container with snake game application. How can you search for such image and check if it exists?
`podman search snake-game`. Surprisingly, there are a couple of matches :) ``` INDEX NAME DESCRIPTION STARS docker.io docker.io/dyego/snake-game 0 docker.io docker.io/ainizetap/snake-game 0 docker.io docker.io/islamifauzi/snake-games 0 docker.io docker.io/harish1551/snake-game 0 docker.io docker.io/spkane/snake-game A console based snake game in a container 0 docker.io docker.io/rahulgadre/snake-game This repository contains all the files to ru... 0 ```
How to list the container images on certain host?
``` CONTAINER_BINARY=podman $CONTAINER_BINARY images ``` Note: you can also use `$CONTAINER_RUNTIME image ls`
How to download/pull a container image without actually running a container?
``` CONTAINER_BINARY=podman $CONTAINER_BINARY pull rhel ```
True or False? It's not possible to remove an image if a certain container is using it
True. You should stop and remove the container before trying to remove the image it uses.
True or False? If a tag isn't specified when pulling an image, the 'latest' tag is being used
True
True or False? Using the 'latest' tag when pulling an image means, you are pulling the most recently published image
False. While this might be true in some cases, it's not guaranteed that you'll pull the latest published image when using the 'latest' tag.
For example, in some images, 'edge' tag is used for the most recently published images.
Where pulled images are stored?
Depends on the container technology being used. For example, in case of Docker, images are stored in `/var/lib/docker/`
Explain container image layers
- The layers of an image is where all the content is stored - code, files, etc. - Each layer is independent - Each layer has an ID that is an hash based on its content - The layers (as the image) are immutable which means a change to one of the layers can be easily identified
True or False? Changing the content of any of the image layers will cause the hash content of the image to change
True. These hashes are content based and since images (and their layers) are immutable, any change will cause the hashes to change.
How to list the layers of an image?
In case of Docker, you can use `docker image inspect `
True or False? In most cases, container images contain their own kernel
False. They share and access the one used by the host on which they are running.
True or False? A single container image can have multiple tags
True. When listing images, you might be able to see two images with the same ID but different tags.
What is a dangling image?
It's an image without tags attached to it. One way to reach this situation is by building an image with exact same name and tag as another already existing image. It can be still referenced by using its full SHA.
How to see changes done to a given image over time?
In the case of Docker, you could use `docker history `
What `podman commit` does?. When will you use it?
Creates a new image from a running container. Users can apply extra changes to be saved in the new image version. Most of the time the user case for using `podman commit` would be to apply changes allowing to better debug the container. Not so much for creating a new image since commit adds additional overhead of potential logs and processes, not required for running the application in the container. This eventually makes images created by `podman commit` bigger due to the additional data stored there.
True or False? Multiple images can share layers
True.
One evidence for that can be found in pulling images. Sometimes when you pull an image, you'll see a line similar to the following:
`fa20momervif17: already exists` This is because it recognizes such layer already exists on the host, so there is no need to pull the same layer twice.
What is the digest of an image? What problem does it solves?
Tags are mutable. This is mean that we can have two different images with the same name and the same tag. It can be very confusing to see two images with the same name and the same tag in your environment. How would you know if they are truly the same or are they different?
This is where "digests` come handy. A digest is a content-addressable identifier. It isn't mutable as tags. Its value is predictable and this is how you can tell if two images are the same content wise and not merely by looking at the name and the tag of the images.
True or False? A single image can support multiple architectures (Linux x64, Windows x64, ...)
True.
What is a distribution hash in regards to layers?
- Layers are compressed when pushed or pulled - distribution hash is the hash of the compressed layer - the distribution hash used when pulling or pushing images for verification (making sure no one tempered with image or layers) - It's also used for avoiding ID collisions (a case where two images have exactly the same generated ID)
How multi-architecture images work? Explain by describing what happens when an image is pulled
1. A client makes a call to the registry to use a specific image (using an image name and optionally a tag) 2. A manifest list is parsed (assuming it exists) to check if the architecture of the client is supported and available as a manifest 3. If it is supported (a manifest for the architecture is available) the relevant manifest is parsed to obtain the IDs of the layers 4. Each layer is then pulled using the obtained IDs from the previous step
How to check which architectures a certain container image supports?
`docker manifest inspect `
How to check what a certain container image will execute once we'll run a container based on that image?
Look for "Cmd" or "Entrypoint" fields in the output of `docker image inspec `
How to view the instructions that were used to build image?
`docker image history :`
How docker image build works?
1. Docker spins up a temporary container 2. Runs a single instruction in the temporary container 3. Stores the result as a new image layer 4. Remove the temporary container 5. Repeat for every instruction
What is the role of cache in image builds?
When you build an image for the first time, the different layers are being cached. So, while the first build of the image might take time, any other build of the same image (given that Containerfile/Dockerfile didn't change or the content used by the instructions) will be instant thanks to the caching mechanism used. In little bit more details, it works this way: 1. The first instruction (FROM) will check if base image already exists on the host before pulling it 2. For the next instruction, it will check in the build cache if an existing layer was built from the same base image + if it used the same instruction 1. If it finds such layer, it skips the instruction and links the existing layer and it keeps using the cache. 2. If it doesn't find a matching layer, it builds the layer and the cache is invalidated. Note: in some cases (like COPY and ADD instructions) the instruction might stay the same but if the content of what being copied is changed then the cache is invalidated. The way this check is done is by comparing the checksum of each file that is being copied.
How to remove an image from the host?
`podman rmi IMAGE` It will fail if some containers are using it. You can then use `--force` flag for that but generally, it's better if you inspect the containers using the image before doing so. To delete all images: `podman rmi -a`
What ways are there to reduce container images size?
* Reduce number of instructions - in some case you may be able to join layers by installing multiple packages with one instructions for example or using `&&` to concatenate RUN instructions * Using smaller images - in some cases you might be using images that contain more than what is needed for your application to run. It is good to get overview of some images and see whether you can use smaller images that you are usually using. * Cleanup after running commands - some commands, like packages installation, create some metadata or cache that you might not need for running the application. It's important to clean up after such commands to reduce the image size * For Docker images, you can use multi-stage builds
What are the pros and cons of squashing images?
Pros: * Smaller image * Reducing number of layers (especially if the image has lot of layers) Cons: * No sharing of the image layers * Push and pull can take more time (because no matching layers found on target)
You would like to share an image with another developer, but without using a registry. How would you do it?
``` # On the local host podman save -o some_image.tar IMAGE rsync some_image.tar SOME_HOST # On the remote host podman load -i some_image.tar ```
True or False? Once a container is stopped and removed, its image removed as well from the host
False. The image will still be available for use by potential containers in the future.
To remove the container, run `podman rmi IMAGE`
How to view the instructions that were used to build image?
`docker image history :`
How to find out which files were added to the container image filesystem?
`podman diff IMAGE_NAME`
True or False? podman diff works only on the container filesystem and not mounted files
True. For mounted files you can use `podman inspec CONTAINER_NAMD/ID`
How the centralized location, where images are stored, is called?
Registry
#### Registry
What is a Registry?
- A registry is a service which stores container images and allows users to pull specified images to run containers. - There are public registries (everyone can access them) and private (accessed only internally in the organization or specific network)
A registry contains one or more ____ which in turn contain one or more ____
A registry contains one or more repositories which in turn contain one or more images.
How to find out which registry do you use by default from your environment?
Depends on the containers technology you are using. For example, in case of Docker, it can be done with `docker info` ``` > docker info Registry: https://index.docker.io/v1 ```
How to configure registries with the containers engine you are using?
For podman, registries can be configured in `/etc/containers/registries.conf` this way: ``` [registries.search] registries = ["quay.io"] ```
How to retrieve the latest ubuntu image?
`podman image pull ubuntu:latest`
How to push an image to a registry?
`podman push IMAGE` You can specify a specific registry: `podman push IMAGE REGISTRY_ADDRESS`
What are some best practices in regards to Container Images?
- Use tags. Using `latest` is quite common (which can mean latest build or latest release) - tag like `3.1` can be used to reference the latest release/tag of the image like `3.1.6` - Don't use `commit` for creating new official images as they include the overhead of logs and processes and usually end up with bigger images - For sharing the image, use a registry (either a public or a private one, depends on your needs)
What ways are there for creating new images?
1. Create a Containerfile/Dockerfile and build an image out of it 2. Using `podman commit` on a running container after making changes to it
#### Tags
What are image tags? Why is it recommended to use tags when supporting multiple releases/versions of a project?
Image tags are used to distinguish between multiple versions of the same software or project. Let's say you developed a project called "FluffyUnicorn" and the current release is `1.0`. You are about to release `1.1` but you still want to keep `1.0` as stable release for anyone who is interested in it. What would you do? If your answer is create another, separate new image, then you probably want to rethink the idea and just create a new image tag for the new release. In addition, it's important to note that container registries support tags. So when pulling an image, you can specify a specific tag of that image.
How to tag an image?
`podman tag IMAGE:TAG` for example: `podman tag FluffyUnicorn:latest`
True or False? Once created, it's impossible to remove a tag for a certain image
False. You can run `podman rmi IMAGE:TAG`.
True or False? Multiple tags can reference the same image
True.
#### Containerfile
What is a Containerfile/Dockerfile?
Different container engines (e.g. Docker, Podman) can build images automatically by reading the instructions from a Containerfile/Dockerfile. A Containerfile/Dockerfile is a text file that contains all the instructions for building an image which containers can use.
What instruction exists in every Containerfile/Dockefile and what does it do?
In every Containerfile/Dockerfile, you can find the instruction `FROM ` which is also the first instruction (at least most of the time. You can put ARG before).
It specifies the base layer of the image to be used. Every other instruction is a layer on top of that base image.
List five different instructions that are available for use in a Containerfile/Dockerfile
* WORKDIR: sets the working directory inside the image filesystems for all the instructions following it * EXPOSE: exposes the specified port (it doesn't adds a new layer, rather documented as image metadata) * ENTRYPOINT: specifies the startup commands to run when a container is started from the image * ENV: sets an environment variable to the given value * USER: sets the user (and optionally the user group) to use while running the image
What are some of the best practices regarding Containerfiles/Dockerfiles that you are following?
* Include only the packages you are going to use. Nothing else. * Specify a tag in FROM instruction. Not using a tag means you'll always pull the latest, which changes over time and might result in unexpected result. * Do not use environment variables to share secrets * Use images from official repositories * Keep images small! - you want them only to include what is required for the application to run successfully. Nothing else. * If are using the apt package manager, you might want to use 'no-install-recommends' with `apt-get install` to install only main dependencies (instead of suggested, recommended packages)
What is the "build context"?
[Docker docs](https://docs.docker.com/engine/reference/commandline/build): "A build’s context is the set of files located in the specified PATH or URL"
What is the difference between ADD and COPY in Containerfile/Dockerfile?
COPY takes in a source and destination. It lets you copy in a file or directory from the build context into the Docker image itself.
ADD lets you do the same, but it also supports two other sources. You can use a URL instead of a file or directory from the build context. In addition, you can extract a tar file from the source directly into the destination.
What is the difference between CMD and RUN in Containerfile/Dockerfile?
RUN lets you execute commands inside of your Docker image. These commands get executed once at build time and get written into your Docker image as a new layer. CMD is the command the container executes by default when you launch the built image. A Containerfile/Dockerfile can only have one CMD. You could say that CMD is a Docker run-time operation, meaning it’s not something that gets executed at build time. It happens when you run an image. A running image is called a container.
How to create a new image using a Containerfile/Dockerfile?
The following command is executed from within the directory where Dockefile resides: `docker image build -t some_app:latest .` `podman image build -t some_app:latest .`
Do you perform any checks or testing on your Containerfiles/Dockerfiles?
One option is to use [hadolint](https://github.com/hadolint/hadolint) project which is a linter based on Containerfile/Dockerfile best practices.
Which instructions in Containerfile/Dockerfile create new layers?
Instructions such as FROM, COPY and RUN, create new image layers instead of just adding metadata.
Which instructions in Containerfile/Dockerfile create image metadata and don't create new layers?
Instructions such as ENTRYPOINT, ENV, EXPOSE, create image metadata and they don't create new layers.
Is it possible to identify which instruction create a new layer from the output of podman image history?
True or False? Each Containerfile instruction runs in an independent container using an image built from every previous layer/entry
True
What's the difference between these two forms: ``` ENTRYPOINT ["cmd", "param0", "param1"] CMD ["param0"] ENTRYPOINT cmd param0 param1 CMD param0 ```
The first form is also referred as "Exec form" and the second one is referred as "Shell form".
The second one (Shell form) wraps the commands in `/bin/sh -c` hence creates a shell process for it. While using either Exec form or Shell form might be fine, it's the mixing that can lead to unexpected results.
Consider: ``` ENTRYPOINT ["ls"] CMD /tmp ``` That would results in running `ls /bin/sh -c /tmp`
Containerfile/Dockerfile can contain more than one ENTRYPOINT instruction and one CMD instruction
True but in case of ENTRYPOINT and CMD only the last instruction takes effect.
What happens when CMD instruction is defined but not an ENTRYPOINT instruction in a Containerfile/Dockerfile?
The ENTRYPOINT from the base image is being used in such case.
In the case of running podman run -it IMAGE ls the ls overrides the ___ instruction
CMD
### Storage
Container storage is said to be ephemeral. What does it mean?
It means the contents of the container and the data generated by it, is gone when the container is removed.
True or False? Applications running on containers, should use the container storage to store persistent data
False. Containers are not built to store persistent data and even if it's possible with some implementations, it might not perform well in case of applications with intensive I/O operations.
You stopped a running container but, it still uses the storage in case you ever resume it. How to reclaim the storage of a container?
In order to reclaim the storage of a container, you have to remove it.
How to create a new volume?
``` CONTAINER_BINARY=podman $CONTAINER_BINARY volume create some_volume ```
How to mount a directory from the host to a container?
``` CONTAINER_BINARY=podman mkdir /tmp/dir_on_the_host $CONTAINER_BINARY run -v /tmp/dir_on_the_host:/tmp/dir_on_the_container IMAGE_NAME ``` In some systems you'll have also to adjust security on the host itself: ``` podman unshare chown -R UID:GUID /tmp/dir_on_the_host sudo semanage fcontext -a -t container_file_t '/tmp/dir_on_the_host(/.*)?' sudo restorecon -Rv /tmp/dir_on_the_host ```
### Architecture
How container achieve isolation from the rest of the system?
Through the use of namespaces and cgroups. Linux kernel has several types of namespaces: - Process ID namespaces: these namespaces include independent set of process IDs - Mount namespaces: Isolation and control of mountpoints - Network namespaces: Isolates system networking resources such as routing table, interfaces, ARP table, etc. - UTS namespaces: Isolate host and domains - IPC namespaces: Isolates interprocess communications - User namespaces: Isolate user and group IDs - Time namespaces: Isolates time machine
What Linux kernel features does containers use?
* cgroups (Control Groups): used for limiting the amount of resources a certain groups of processes (and their children of course) use. This way, a group of processes isn't consuming all host resources and other groups can run and use part of the resources as well * namespaces: same as cgroups, namespaces isolate some of the system resources so it's available only for processes in the namespace. Differently from cgroups the focus with namespaces is on resources like mount points, IPC, network, ... and not about memory and CPU as in cgroups * SElinux: the access control mechanism used to protect processes. Unfortunately to this date many users don't actually understand SElinux and some turn it off but nonetheless, it's a very important security feature of the Linux kernel, used by container as well * Seccomp: similarly to SElinux, it's also a security mechanism, but its focus is on limiting the processes in regards to using system calls and file descriptors
Describe in detail what happens when you run `podman/docker run hello-world`?
Docker/Podman CLI passes your request to Docker daemon. Docker/Podman daemon downloads the image from Docker Hub Docker/Podman daemon creates a new container by using the image it downloaded Docker/Podman daemon redirects output from container to Docker CLI which redirects it to the standard output
Describe difference between cgroups and namespaces
cgroup: Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behavior. namespace: wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. In short: Cgroups = limits how much you can use; namespaces = limits what you can see (and therefore use) Cgroups involve resource metering and limiting: memory CPU block I/O network Namespaces provide processes with their own view of the system Multiple namespaces: pid,net, mnt, uts, ipc, user
Which of the following are Linux features that containers use? * cspaces * namegroups * namespaces * cgroups * ELlinux * SElinux
* namespaces * cgroups * SElinux
True or False? Containers have ephemeral storage layer
True. The ephemeral storage layer is added on top of the base image layer and is exclusive to the running container. This way, containers created from the same base image, don't share the same storage.
### Docker Architecture
Which components/layers compose the Docker technology?
1. Runtime - responsible for starting and stopping containers 2. Daemon - implements the Docker API and takes care of managing images (including builds), authentication, security, networking, etc. 3. Orchestrator
What components are part of the Docker engine?
- Docker daemon - containerd - runc
What is the low-level runtime?
- The low level runtime is called runc - It manages every container running on Docker host - Its purpose is to interact with the underlying OS to start and stop containers - Its reference implementation is of the OCI (Open Containers Initiative) container-runtime-spec - It's a small CLI wrapper for libcontainer
What is the high-level runtime?
- The high level runtime is called containerd - It was developed by Docker Inc and at some point donated to CNCF - It manages the whole lifecycle of a container - start, stop, remove and pause - It take care of setting up network interfaces, volume, pushing and pulling images, ... - It manages the lower level runtime (runc) instances - It's used both by Docker and Kubernetes as a container runtime - It sits between Docker daemon and runc at the OCI layer Note: running `ps -ef | grep -i containerd` on a system with Docker installed and running, you should see a process of containerd
True or False? The docker daemon (dockerd) performs lower-level tasks compared to containerd
False. The Docker daemon performs higher-level tasks compared to containerd.
It's responsible for managing networks, volumes, images, ...
Describe in detail what happens when you run `docker pull image:tag`?
Docker CLI passes your request to Docker daemon. Dockerd Logs shows the process docker.io/library/busybox:latest resolved to a manifestList object with 9 entries; looking for a unknown/amd64 match found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:400ee2ed939df769d4681023810d2e4fb9479b8401d97003c710d0e20f7c49c6 pulling blob \"sha256:61c5ed1cbdf8e801f3b73d906c61261ad916b2532d6756e7c4fbcacb975299fb Downloaded 61c5ed1cbdf8 to tempfile /var/lib/docker/tmp/GetImageBlob909736690 Applying tar in /var/lib/docker/overlay2/507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7/diff" storage-driver=overlay2 Applied tar sha256:514c3a3e64d4ebf15f482c9e8909d130bcd53bcc452f0225b0a04744de7b8c43 to 507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7, size: 1223534
Describe in detail what happens when you run a container
1. The Docker client converts the run command into an API payload 2. It then POST the payload to the API endpoint exposed by the Docker daemon 3. When the daemon receives the command to create a new container, it makes a call to containerd via gRPC 4. containerd converts the required image into an OCI bundle and tells runc to use that bundle for creating the container 5. runc interfaces with the OS kernel to pull together the different constructs (namespace, cgroups, etc.) used for creating the container 6. Container process is started as a child-process of runc 7. Once it starts, runc exists
True or False? Killing the Docker daemon will kill all the running containers
False. While this was true at some point, today the container runtime isn't part of the daemon (it's part of containerd and runc) so stopping or killing the daemon will not affect running containers.
True or False? containerd forks a new instance runc for every container it creates
True
True or False? Running a dozen of containers will result in having a dozen of runc processes
False. Once a container is created, the parent runc process exists.
What is shim in regards to Docker?
shim is the process that becomes the container's parent when runc process exists. It's responsible for: - Reporting exit code back to the Docker daemon - Making sure the container doesn't terminate if the daemon is being restarted. It does so by keeping the stdout and stdin open
How would you transfer data from one container into another?
What happens to data of the container when a container exists?
How do you remove old, non running, containers?
1. To remove one or more Docker images use the docker container rm command followed by the ID of the containers you want to remove. 2. The docker system prune command will remove all stopped containers, all dangling images, and all unused networks 3. docker rm $(docker ps -a -q) - This command will delete all stopped containers. The command docker ps -a -q will return all existing container IDs and pass them to the rm command which will delete them. Any running containers will not be deleted.
How the Docker client communicates with the daemon?
Via the local socket at `/var/run/docker.sock`
Explain Docker interlock
What is Docker Repository?
Explain image layers
A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Containerfile/Dockerfile. Each layer except the very last one is read-only. Each layer is only a set of differences from the layer before it. The layers are stacked on top of each other. When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the “container layer”. All changes made to the running container, such as writing new files, modifying existing files, and deleting files, are written to this thin writable container layer. The major difference between a container and an image is the top writable layer. All writes to the container that add new or modify existing data are stored in this writable layer. When the container is deleted, the writable layer is also deleted. The underlying image remains unchanged. Because each container has its own writable container layer, and all changes are stored in this container layer, multiple containers can share access to the same underlying image and yet have their own data state.
What best practices are you familiar related to working with containers?
How do you manage persistent storage in Docker?
How can you connect from the inside of your container to the localhost of your host, where the container runs?
How do you copy files from Docker container to the host and vice versa?
### Docker Compose
Explain what is Docker compose and what is it used for
Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration. For example, you can use it to set up ELK stack where the services are: elasticsearch, logstash and kibana. Each running in its own container.
In general, it's useful for running applications which composed out of several different services. It let's you manage it as one deployed app, instead of different multiple separate services.
Describe the process of using Docker Compose

* Define the services you would like to run together in a docker-compose.yml file * Run `docker-compose up` to run the services
Explain Multi-stage builds
Multi-stages builds allow you to produce smaller container images by splitting the build process into multiple stages. As an example, imagine you have one Containerfile/Dockerfile where you first build the application and then run it. The whole build process of the application might be using packages and libraries you don't really need for running the application later. Moreover, the build process might produce different artifacts which not all are needed for running the application. How do you deal with that? Sure, one option is to add more instructions to remove all the unnecessary stuff but, there are a couple of issues with this approach: 1. You need to know what to remove exactly and that might be not as straightforward as you think 2. You add new layers which are not really needed A better solution might be to use multi-stage builds where one stage (the build process) is passing the relevant artifacts/outputs to the stage that runs the application.
True or False? In multi-stage builds, artifacts can be copied between stages
True. This allows us to eventually produce smaller images.
What .dockerignore is used for?
By default, Docker uses everything (all the files and directories) in the directory you use as build context.
`.dockerignore` used for excluding files and directories from the build context
### Networking
What container network standards or architectures are you familiar with?
CNM (Container Network Model): * Requires distrubited key value store (like etcd for example) for storing the network configuration * Used by Docker CNI (Container Network Interface): * Network configuration should be in JSON format
### Docker Networking
What network specification Docker is using and how its implementation is called?
Docker is using the CNM (Container Network Model) design specification.
The implementation of CNM specification by Docker is called "libnetwork". It's written in Go.
Explain the following blocks in regards to CNM: * Networks * Endpoints * Sandboxes
* Networks: software implementation of an switch. They used for grouping and isolating a collection of endpoints. * Endpoints: Virtual network interfaces. Used for making connections. * Sandboxes: Isolated network stack (interfaces, routing tables, ports, ...)
True or False? If you would like to connect a container to multiple networks, you need multiple endpoints
True. An endpoint can connect only to a single network.
What are some features of libnetwork?
* Native service discovery * ingress-based load balancer * network control plane and management plane
### Security
What security best practices are there regarding containers?
* Install only the necessary packages in the container * Don't run containers as root when possible * Don't mount the Docker daemon unix socket into any of the containers * Set volumes and container's filesystem to read only * DO NOT run containers with `--privilged` flag
A container can cause a kernel panic and bring down the whole host. What preventive actions can you apply to avoid this specific situation?
* Install only the necessary packages in the container * Set volumes and container's filesystem to read only * DO NOT run containers with `--privilged` flag
### Docker in Production
What are some best practices you following in regards to using containers in production?
Images: * Use images from official repositories * Include only the packages you are going to use. Nothing else. * Specify a tag in FROM instruction. Not using a tag means you'll always pull the latest, which changes over time and might result in unexpected result. * Do not use environment variables to share secrets * Keep images small! - you want them only to include what is required for the application to run successfully. Nothing else. Components: * Secured connection between components (e.g. client and server)
True or False? It's recommended for production environments that Docker client and server will communicate over network using HTTP socket
False. Communication between client and server shouldn't be done over HTTP since it's insecure. It's better to enforce the daemon to only accept network connection that are secured with TLS.
Basically, the Docker daemon will only accept secured connections with certificates from trusted CA.
What forms of self-healing options available for Docker containers?
Restart Policies. It allows you to automatically restart containers after certain events.
What restart policies are you familiar with?
* always: restart the container when it's stopped (not with `docker container stop`) * unless-stopped: restart the container unless it was in stopped status * no: don't restart the container at any point (default policy) * on-failure: restart the container when it exists due to an error (= exit code different than zero)
Explain Rootless Containers
Historically, user needed root privileges to run containers. One of the most basic security recommendations is to provide users with minimum privileges for what they need. For containers it's been the situation for a long time and still for running some containers today from docker.io, you'll need to have root privileges.
Are there disadvantages in running rootless containers?
Yes, the full list can be found [here](https://github.com/containers/podman/blob/main/rootless.md). Some worth to mention: - No binding to ports smaller than 1024 - No images sharing CRI-O or other rootful users - No support running on NFS or parallel filesystem homerdirs - Some commands don't work (mount, podman stats, checkpoint, restore, ...)
Give one example of rootless containers are more safe from security perspective
In rootless containers, user namespace appears to be running as root but it doesn't, it's executed with regular user privileges. If an attacker manages to get out of the user space to the host with the same privileges, there's not much he can do because it's not root privileges as opposed to containers that run with root privileges.
When running a container, usually a virtual ethernet device is created. To do so, root privileges are required. How is it then managed in rootless containers?
Networking is usually managed by Slirp in rootless containers. Slirp creates a tap device which is also the default route and it creates it in the network namespace of the container. This device's file descriptor passed to the parent who runs it in the default namespace and the default namespace connected to the internet. This enables communication externally and internally.
When running a container, usually a layered file system is created, but it requires root privileges. How is it then managed in rootless containers?
New drivers were created to allow creating filesystems in a user namespaces. Drivers like the FUSE-OverlayFS.
### OCI
What is the OCI?
OCI (Open Container Initiative) is an open governance established in 2015 to standardize container creation - mostly image format and runtime. At that time there were a number of parties involved and the most prominent one was Docker. Specifications published by OCI: - [image-spec](https://github.com/opencontainers/image-spec) - [runtime-spec](https://github.com/opencontainers/runtime-spec)
Which operations OCI based containers must support?
Create, Kill, Delete, Start and Query State.
### Scenarios
There is a running container that has a certain issue. You would like to share an image of that container with your team members, with certain environment variables set for debugging purposes. How would you do it?
`podman commit` can be a good choice for that. You can create a new image of the running container (with the issue) and share that new image with your team members.
What you probably want to avoid using: - Using something as `podman save/load` as it applies on an image, not a running container (so you'll share the image but the issue might not be reproduced when your team members run a container using it) - Modifying Containerfile/Dockerfile as you don't really want to add environment variables meant for debugging to the source from which you usually build images
You and your team work on the same project, but different versions of it. For each version, the team creates a new, separate image. What would you suggest the team to change in such case?
Use tags. You can distinguish between different releases of a project using image tags. There is no need to create an entire separate image for version/release of a project.
================================================ FILE: topics/containers/commit_image.md ================================================ # Create Images on The Fly ## Requirements Have at least one image locally (run `podman image ls` to confirm).
If you don't have images locally, run simply `podman pull nginx:alpine`. ## Objectives 1. Run a container using a web server image (e.g. httpd, nginx, ...) - Bind container's port 80 to local port 80 - Run it in detached mode - Name should nginx_container 2. Verify the web server runs and accessible 3. Create an HTML file with the following content and copy it to the container to the container to path where it will be accessed as an index file ``` It's a me

Mario

``` 4. Create an image out of the running container and call it "nginx_mario" 5. Tag the container with "mario" tag 6. Remove the original container (container_nginx) and verify it was removed 7. Create a new container out of the image you've created (the same way as the original container) 8. Run `curl 127.0.0.1:80`. What do you see? 9. Run `podman diff` on the new image. Explain the output ## Solution Click [here to view the solution](solutions/commit_image.md) ================================================ FILE: topics/containers/containerized_db.md ================================================ ## Containerized DB 1. Run a container with a database of any type of you prefer (MySql, PostgreSQL, Mongo, etc.) 2. Verify the container is running 3. Access the container and create a new table (or collection, depends on which DB type you chose) for students 4. Insert a row (or document) of a student 5. Verify the row/document was added Click [here for the solution](solutions/containerized_db.md) ================================================ FILE: topics/containers/containerized_db_persistent_storage.md ================================================ # Containerized DB with Persistent Storage 1. Run a container with a database of any type of you prefer (MySql, PostgreSQL, Mongo, etc.) 1. Use a mount point on the host for the database instead of using the container storage for that 2. Explain why using the host storage instead of the container one might be a better choice 2. Verify the container is running ================================================ FILE: topics/containers/containerized_web_server.md ================================================ # Containerized Web Server 1. Run a containerized web server in the background and bind its port (8080) to a local port 2. Verify the port (8080) is bound 3. Reach the webserver from your local host 4. Now run the same web application but bound it to the local port 8080 Click [here for the solution](solutions/containerized_web_server.md) ================================================ FILE: topics/containers/image_layers.md ================================================ ## Layer by Layer ### Objective Learn about image layers ### Requirements Make sure Docker is installed on your system and the service is started ``` # Fedora/RHEL/CentOS rpm -qa | grep docker systemctl status docker ``` ### Instructions 1. Write a Dockefile. Any Dockefile! :) (just make sure it's a valid one) 2. Build an image using the Dockerfile you've wrote 3. Which of the instructions you've used, created new layers and which added image metadata? 4. What ways are there to confirm your answer to the last question? 5. Can you reduce the size of the image you've created? ================================================ FILE: topics/containers/multi_stage_builds.md ================================================ ## Multi-Stage Builds ### Objective Learn about multi-stage builds ### Instructions 1. Without actually building an image or running any container, use the following Dockerfile and convert it to use multi-stage: ``` FROM nginx RUN apt-get update \ && apt-get install -y curl python build-essential \ && apt-get install -y nodejs \ && apt-get clean -y RUN mkdir -p /my_app ADD ./config/nginx/docker.conf /etc/nginx/nginx.conf ADD ./config/nginx/k8s.conf /etc/nginx/nginx.conf.k8s ADD app/ /my_cool_app WORKDIR /my_cool_app RUN npm install -g ember-cli RUN npm install -g bower RUN apt-get update && apt-get install -y git \ && npm install \ && bower install \ RUN ember build — environment=prod CMD [ “/root/nginx-app.sh”, “nginx”, “-g”, “daemon off;” ] ``` 2. What are the benefits of using multi-stage builds? ================================================ FILE: topics/containers/run_forest_run.md ================================================ ## Run, Forest, Run! ### Objective Learn what restart policies do and how to use them ### Requirements Make sure Docker is installed on your system and the service is started ``` # Fedora/RHEL/CentOS rpm -qa | grep docker systemctl status docker ``` ### Instructions 1. Run a container with the following properties: * image: alpine * name: forest * restart policy: always * command to execute: sleep 15 2. Run `docker container ls` - Is the container running? What about after 15 seconds, is it still running? why? 3. How then can we stop the container from running? 4. Remove the container you've created 5. Run the same container again but this time with `sleep 600` and verify it runs 6. Restart the Docker service. Is the container still running? why? 8. Update the policy to `unless-stopped` 9. Stop the container 10. Restart the Docker service. Is the container running? why? ================================================ FILE: topics/containers/running_containers.md ================================================ ## Running Containers ### Objective Learn how to run, stop and remove containers ### Requirements Make sure Podman or Docker (or any other containers engine) is installed on your system ### Instructions 1. Run a container using the latest nginx image 2. List the containers to make sure the container is running 3. Run another container but this time use ubuntu latest and attach to the terminal of the container 4. List again the containers. How many containers are running? 5. Stop the containers 6. Remove the containers ================================================ FILE: topics/containers/sharing_images.md ================================================ # Sharing Images ## Requirements Have at least one image locally (run `podman image ls` to confirm).
If you don't have images locally, run simply `podman pull httpd`. ## Objectives 1. Choose an image and create an archive out of it 2. Check the archive size. Is it different than the image size? If yes, what's the difference? If not, why? 3. Copy the generated archive to a remote host 4. Load the image 5. Verify it was loaded and exists on the remote host ## Solution Click [here to view the solution](solutions/sharing_images.md) ================================================ FILE: topics/containers/solutions/commit_image.md ================================================ # Create Images on The Fly ## Requirements Have at least one image locally (run `podman image ls` to confirm).
If you don't have images locally, run simply `podman pull nginx:alpine`. ## Objectives 1. Run a container using a web server image (e.g. httpd, nginx, ...) - Bind container's port 80 to local port 80 - Run it in detached mode - Name should nginx_container 2. Verify the web server runs and accessible 3. Create an HTML file with the following content and copy it to the container to the container to path where it will be accessed as an index file ``` It's a me

Mario

``` 4. Create an image out of the running container and call it "nginx_mario" 5. Tag the container with "mario" tag 6. Remove the original container (container_nginx) and verify it was removed 7. Create a new container out of the image you've created (the same way as the original container) 8. Run `curl 127.0.0.1:80`. What do you see? 9. Run `podman diff` on the new image. Explain the output ## Solution ``` # Run the container podman run --name nginx_container -d -p 80:80 nginx:alpine # Verify web server is running curl 127.0.0.1:80 # # # # Welcome to nginx! # Create index.html file cat <>index.html It's a me

Mario

EOT # Copy index.html to the container podman cp index.html nginx_container:/usr/share/nginx/html/index.html # Create a new image out of the running container podman commit nginx_container nginx_mario # Tag the image podman image ls # localhost/nginx_mario latest dc7ed2343521 52 seconds ago 25 MB podman tag dc7ed2343521 mario # Remove the container podman stop nginx_container podman rm nginx_container podman ps -a # no container 'nginx_container' # Create a container out of the image podman run -d -p 80:80 nginx_mario # Check the container created from the new image curl 127.0.0.1:80 # # #It's a me # # #

Mario

# # Run diff podman diff nginx_mario C /etc C /etc/nginx/conf.d C /etc/nginx/conf.d/default.conf A /run/nginx.pid C /usr/share/nginx/html C /usr/share/nginx/html/index.html C /var/cache/nginx C /var C /var/cache A /var/cache/nginx/client_temp A /var/cache/nginx/fastcgi_temp A /var/cache/nginx/proxy_temp A /var/cache/nginx/scgi_temp A /var/cache/nginx/uwsgi_temp # We've set new index.html which explains why it's changed (C) # We also created the image while the web server is running, which explains all the files created under /var ``` ================================================ FILE: topics/containers/solutions/containerized_db.md ================================================ # Containerized DB 1. Run a container with a database of any type of you prefer (MySql, PostgreSQL, Mongo, etc.) 2. Verify the container is running 3. Access the container and create a new table (or collection, depends on which DB type you chose) for students 4. Insert a row (or document) of a student 5. Verify the row/document was added ## Solution ``` # Run the container podman run --name mysql -e MYSQL_USER=mario -e MYSQL_PASSWORD=tooManyMushrooms -e MYSQL_DATABASE=university -e MYSQL_ROOT_PASSWORD=MushroomsPizza -d mysql # Verify it's running podman ps # Add student row to the database podman exec -it mysql /bin/bash mysql -u root use university; CREATE TABLE Students (id int NOT NULL, name varchar(255) DEFAULT NULL, PRIMARY KEY (id)); insert into Projects (id, name) values (1,'Luigi'); select * from Students; ``` ================================================ FILE: topics/containers/solutions/containerized_db_persistent_storage.md ================================================ # Containerized DB with Persistent Storage 1. Run a container with a database of any type of you prefer (MySql, PostgreSQL, Mongo, etc.) 1. Use a mount point on the host for the database instead of using the container storage for that 2. Explain why using the host storage instead of the container one might be a better choice 2. Verify the container is running ## Solution ``` # Create the directory for the DB on host mkdir -pv ~/local/mysql sudo semanage fcontext -a -t container_file_t '/home/USERNAME/local/mysql(/.*)?' sudo restorecon -R /home/USERNAME/local/mysql # Run the container podman run --name mysql -e MYSQL_USER=mario -e MYSQL_PASSWORD=tooManyMushrooms -e MYSQL_DATABASE=university -e MYSQL_ROOT_PASSWORD=MushroomsPizza -d mysql -v /home/USERNAME/local/mysql:/var/lib/mysql/db # Verify it's running podman ps ``` It's better to use the storage host because in case the container ever gets removed (or storage reclaimed) you have the DB data still available. ================================================ FILE: topics/containers/solutions/containerized_web_server.md ================================================ # Containerized Web Server 1. Run a containerized web server in the background and bind its port (8080) to a local port 2. Verify the port (8080) is bound 3. Reach the webserver from your local host 4. Now run the same web application but bound it to the local port 8080 ## Solution ``` $ podman run -d -p 8080 httpd # run the container and bind the port 8080 to a local port $ podman port -l 8080 # show to which local port the port 8080 on the container, binds to 0.0.0.0:41203 $ curl http://0.0.0.0:41203 # use the port from the output of the previous command !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> Test Page for the HTTP Server on Red Hat Enterprise Linux $ podman run -d -p 8080:8080 httpd ``` ================================================ FILE: topics/containers/solutions/image_layers.md ================================================ ## Layer by Layer ### Objective Learn about image layers ### Requirements Make sure Docker is installed on your system and the service is started ``` # Fedora/RHEL/CentOS rpm -qa | grep docker systemctl status docker ``` ### Instructions 1. Write a Dockefile. Any Dockefile! :) (just make sure it's a valid one) ``` FROM ubuntu EXPOSE 212 ENV foo=bar WORKDIR /tmp RUN dd if=/dev/zero of=some_file bs=1024 count=0 seek=1024 RUN dd if=/dev/zero of=some_file bs=1024 count=0 seek=1024 RUN dd if=/dev/zero of=some_file bs=1024 count=0 seek=1024 ``` 2. Build an image using the Dockerfile you've wrote `docker image build -t super_cool_app:latest .` 3. Which of the instructions you've used, created new layers and which added image metadata? ``` FROM, RUN -> new layer EXPOSE, ENV, WORKDIR -> metadata ``` 4. What ways are there to confirm your answer to the last question? You can run `docker image history super_cool_app`. It will show you each instruction and its size. Usually instructions that create new layers has non-zero size, but this is not something you can rely on by itself since, some run commands can have size of zero in `docker image history` output (e.g. `ls -l`). You can also use `docker image inspect super_cool_appl` and see if in the output, under "RootFS", there are the number of layers that matches the instructions that should create new layers. 5. Can you reduce the size of the image you've created? yes, for example, use all the RUN instructions as a single RUN instruction this way: `RUN dd if=/dev/zero of=some_file bs=1024 count=0 seek=1024 && dd if=/dev/zero of=some_file bs=1024 count=0 seek=1024 && dd if=/dev/zero of=some_file bs=1024 count=0 seek=1024` The change in size might not be dramatic in this case, but in some cases it will make a big impact on the image size. ================================================ FILE: topics/containers/solutions/multi_stage_builds.md ================================================ ## Multi-Stage Builds ### Objective Learn about multi-stage builds ### Instructions 1. Without actually building an image or running any container, use the following Dockerfile and convert it to use multi-stage: ``` FROM nginx RUN apt-get update \ && apt-get install -y curl python build-essential \ && apt-get install -y nodejs \ && apt-get clean -y RUN mkdir -p /my_app ADD ./config/nginx/docker.conf /etc/nginx/nginx.conf ADD ./config/nginx/k8s.conf /etc/nginx/nginx.conf.k8s ADD app/ /my_cool_app WORKDIR /my_cool_app RUN npm install -g ember-cli RUN npm install -g bower RUN apt-get update && apt-get install -y git \ && npm install \ && bower install \ RUN ember build — environment=prod CMD [ “/root/nginx-app.sh”, “nginx”, “-g”, “daemon off;” ] ``` 2. What are the benefits of using multi-stage builds? ### Solution 1. One possible solution (the emphasize is on passing the app from the first stage): ``` FROM node:6 RUN mkdir -p /my_cool_app RUN npm install -g ember-cli RUN npm install -g bower WORKDIR /my_cool_app RUN npm install ADD app/ /my_cool_app RUN bower install RUN ember build — environment=prod FROM nginx RUN mkdir -p /my_cool_app ADD ./config/nginx/docker.conf /etc/nginx/nginx.conf ADD ./config/nginx/k8s.conf /etc/nginx/nginx.conf.k8s # Copy build artifacts from the first stage COPY — from=0 /my_cool_app/dist /my_cool_app/dist WORKDIR /my_cool_app CMD [ “/root/nginx-app.sh”, “nginx”, “-g”, “daemon off;” ] ``` 2. Multi-stages builds allow you to produce smaller container images by splitting the build process into multiple stages as we did above. The app image doesn't contain anything related to the build process except the actual app. ================================================ FILE: topics/containers/solutions/run_forest_run.md ================================================ ## Run, Forest, Run! ### Objective Learn what restart policies do and how to use them ### Requirements Make sure Docker is installed on your system and the service is started ``` # Fedora/RHEL/CentOS rpm -qa | grep docker systemctl status docker ``` ### Instructions 1. Run a container with the following properties: * image: alpine * name: forest * restart policy: always * command to execute: sleep 15 `docker run --restart always --name forest alpine sleep 15` 2. Run `docker container ls` - Is the container running? What about after 15 seconds, is it still running? why? It runs even after it completes to run `sleep 15` because the restart policy is "always". This means that Docker will keep restarting the **same** container even after it exists. 3. How then can we stop the container from running? The restart policy doesn't apply when the container is stopped with the command `docker container stop` 4. Remove the container you've created ``` docker container stop forest docker container rm forest ``` 5. Run the same container again but this time with `sleep 600` and verify it runs ``` docker run --restart always --name forest alpine sleep 600 docker container ls ``` 6. Restart the Docker service. Is the container still running? why? ``` sudo systemctl restart docker ``` Yes, it's still running due to the restart policy `always` which means Docker will always bring up the container after it exists or stopped (not with the stop command). 8. Update the policy to `unless-stopped` `docker update --restart unless-stopped forest` 9. Stop the container `docker container stop forest` 10. Restart the Docker service. Is the container running? why? ``` sudo systemctl restart docker ``` No, the container is not running. This is because we changed the policy to `unless-stopped` which will run the container unless it was in stopped status. Since before the restart we stopped the container, Docker didn't continue running it after the restart. ================================================ FILE: topics/containers/solutions/running_containers.md ================================================ ## Running Containers ### Objective Learn how to run, stop and remove containers ### Requirements Make sure Podman or Docker (or any other containers engine) is installed on your system ### Instructions 1. Run a container using the latest nginx image - `podman container run nginx:latest` 2. List the containers to make sure the container is running - `podman container ls` 3. Run another container but this time use ubuntu latest and attach to the terminal of the container - `podman container run -it ubuntu:latest /bin/bash` 4. List again the containers. How many containers are running? - `podman container ls` -> 2 5. Stop the containers - WARNING: the following will stop all the containers on the host: `podman stop $(podman container ls -q)` or for each container `podman stop [container id/name]` 6. Remove the containers - WARNING: the following will remove other containers as well if such are running: `podman rm $(podman container ls -q -a)` or for each container `podman rm [container id/name]` ================================================ FILE: topics/containers/solutions/sharing_images.md ================================================ # Sharing Images ## Requirements Have at least one image locally (run `podman image ls` to confirm).
If you don't have images locally, run simply `podman pull httpd`. ## Objectives 1. Choose an image and create an archive out of it 2. Check the archive size. Is it different than the image size? If yes, what's the difference? If not, why? 3. Copy the generated archive to a remote host 4. Load the image 5. Verify it was loaded and exists on the remote host ## Solution ``` # Save image as an archive podman save -o httpd.tar httpd # Check archive and image sizes du -sh httpd.tar # output: 143MB podman image ls | grep httpd # output: 149MB # The archive is obviously smaller than the image itself (6MB difference) # Copy the archive to a remote host rsync -azc httpd.tar USER@REMOTE_HOST_FQDN:/tmp/ # Load the image podman load -i /tmp/httpd.tar # Verify it exists on the system after loading podman image ls ``` ================================================ FILE: topics/containers/solutions/working_with_images.md ================================================ ## Working with Images - Solution ### Objective Learn how to work with containers images ### Requirements Make sure Podman, Docker (or any other containers engine) is installed on your system ### Instructions 1. List the containers images in your environment - `podman image ls` 2. Pull the latest ubuntu image - `podman image pull ubuntu:latest` 3. Run a container with the image you just pulled - `podman container run -it ubuntu:latest /bin/bash` 4. Remove the image. Did it work? - No. There is a running container which is using the image we try to remove 5. Do whatever is needed in order to remove the image - `podman rm ; podman image rm ubuntu` ================================================ FILE: topics/containers/working_with_images.md ================================================ ## Working with Images ### Objective Learn how to work with containers images ### Requirements Make sure Podman or Docker (or any other containers engine) is installed on your system ### Instructions 1. List the containers images in your environment 2. Pull the latest ubuntu image 3. Run a container with the image you just pulled 4. Remove the image. Did it work? 5. Do whatever is needed in order to remove the image ================================================ FILE: topics/containers/write_containerfile_run_container.md ================================================ # Write a Containerfile and run a container ## Objectives 1. Create an image: * Use centos or ubuntu as the base image * Install apache web server * Deploy any web application you want * Add https support (using HAProxy as reverse-proxy) 2. Once you wrote the Containerfile and created an image, run the container and test the application. Describe how did you test it and provide output 3. Describe one or more weakness of your Containerfile. Is it ready to be used in production? ================================================ FILE: topics/databases/README.md ================================================ # Databases - [Databases](#databases) - [Exercises](#exercises) - [Questions](#questions) - [SQL](#sql) - [Time Series](#time-series) ## Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Message Board Tables | Relational DB Tables | [Exercise](topics/databases/table_for_message_board_system.md) | [Solution](topics/databases/solutions/table_for_message_board_system.md) ## Questions
What type of databases are you familiar with?
Relational (SQL) NoSQL Time series
### SQL
What is a relational database?
* Data Storage: system to store data in tables * SQL: programming language to manage relational databases * Data Definition Language: a standard syntax to create, alter and delete tables
What does it mean when a database is ACID compliant?
ACID stands for Atomicity, Consistency, Isolation, Durability. In order to be ACID compliant, the database must meet each of the four criteria **Atomicity** - When a change occurs to the database, it should either succeed or fail as a whole. For example, if you were to update a table, the update should completely execute. If it only partially executes, the update is considered failed as a whole, and will not go through - the DB will revert back to it's original state before the update occurred. It should also be mentioned that Atomicity ensures that each transaction is completed as it's own stand alone "unit" - if any part fails, the whole statement fails. **Consistency** - any change made to the database should bring it from one valid state into the next. For example, if you make a change to the DB, it shouldn't corrupt it. Consistency is upheld by checks and constraints that are pre-defined in the DB. For example, if you tried to change a value from a string to an int when the column should be of datatype string, a consistent DB would not allow this transaction to go through, and the action would not be executed **Isolation** - this ensures that a database will never be seen "mid-update" - as multiple transactions are running at the same time, it should still leave the DB in the same state as if the transactions were being run sequentially. For example, let's say that 20 other people were making changes to the database at the same time. At the time you executed your query, 15 of the 20 changes had gone through, but 5 were still in progress. You should only see the 15 changes that had completed - you wouldn't see the database mid-update as the change goes through. **Durability** - Once a change is committed, it will remain committed regardless of what happens (power failure, system crash, etc.). This means that all completed transactions must be recorded in non-volatile memory. Note that SQL is by nature ACID compliant. Certain NoSQL DB's can be ACID compliant depending on how they operate, but as a general rule of thumb, NoSQL DB's are not considered ACID compliant
What is sharding?
Sharding is a horizontal partitioning. Are you able to explain what is it good for?
You find out your database became a bottleneck and users experience issues accessing data. How can you deal with such situation?
Not much information provided as to why it became a bottleneck and what is current architecture, so one general approach could be
to reduce the load on your database by moving frequently-accessed data to in-memory structure.
What is a connection pool?
Connection Pool is a cache of database connections and the reason it's used is to avoid an overhead of establishing a connection for every query done to a database.
What is a connection leak?
A connection leak is a situation where database connection isn't closed after being created and is no longer needed.
What is Table Lock?
Your database performs slowly than usual. More specifically, your queries are taking a lot of time. What would you do?
* Query for running queries and cancel the irrelevant queries * Check for connection leaks (query for running connections and include their IP) * Check for table locks and kill irrelevant locking sessions
What is a Data Warehouse?
"A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of organisation's decision-making process"
Explain what is a time-series database
What is OLTP (Online transaction processing)?
What is OLAP (Online Analytical Processing)?
What is an index in a database?
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
What data types are there in relational databases?
Explain Normalization
Data that is used multiple times in a database should be stored once and referenced with a foreign key.
This has the clear benefit of ease of maintenance where you need to change a value only in a single place to change it everywhere.
Explain Primary Key and Foreign Key
Primary Key: each row in every table should a unique identifier that represents the row.
Foreign Key: a reference to another table's primary key. This allows you to join table together to retrieve all the information you need without duplicating data.
What types of data tables have you used?
* Primary data table: main data you care about * Details table: includes a foreign key and has one to many relationship * Lookup values table: can be one table per lookup or a table containing all the lookups and has one to many relationship * Multi reference table
What is ORM? What benefits it provides in regards to relational databases usage?
[Wikipedia](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping): "is a programming technique for converting data between incompatible type systems using object-oriented programming languages" In regards to the relational databases: * Database as code * Database abstraction * Encapsulates SQL complexity * Enables code review process * Enables usage as a native OOP structure
What is DDL?
[Wikipedia](https://en.wikipedia.org/wiki/Data_definition_language): "In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users."
### Time Series
What is Time Series database?
A database designed specifically for time series based data. It comes with multiple optimizations: : complete this :)
================================================ FILE: topics/databases/solutions/table_for_message_board_system.md ================================================ ## Database Table for Message Board System ### Instructions Design a database table for a message board system. It should include the following information: * Personal details * Who saw the message and when * Replies * Tagged people in the message * Message categories Notes: * No SQL is needed * You should include: table names, field names, data types and mention the foreign keys used. ### Solution Note: This is just one possible design 2nd Note: PK = primary key, FK = Foreign key ----- People ----- ID int PK FirstName varchar(255) LastName varchar(255) DOB date Gender varchar(1) Phone varchar(10) | \ | \ | \ v \ \ --- Messages --- v ID int PK MessageBoardID FK --- MessageTags --- --- MessageBoards --- PeopleID int FK ID int PK ID int PK ----> MsgDate datetime ---> MessageID FK Board text Message text PeopleID int Fk MessageID (FK) ^ | | | |______| ================================================ FILE: topics/databases/table_for_message_board_system.md ================================================ ## Database Table for Message Board System ### Instructions Design a database table for a message board system. It should include the following information: * Personal details * Who saw the message and when * Replies * Tagged people in the message * Message categories Notes: * No SQL is needed * You should include: table names, field names, data types and mention the foreign keys used. ================================================ FILE: topics/datadog/README.md ================================================ # DataDog - [DataDog](#datadog) - [Questions](#questions) - [Basics](#basics) - [Datadog Agent](#datadog-agent) - [Datadog Integrations](#datadog-integrations) ## Questions ### Basics
Describe at least three use cases for using something like Datadog. Can be as specific as you would like
* Monitor instances/servers downtime * Detect anomalies and send an alert when it happens * Service request or response latency
What ways are there to collect or send data to Datadog?
* Datadog agent installed on the device or location which you would like to monitor * Using Datadog API * Built-in integrations
What is a host in regards to Datadog?
Any physical or virtual instance that is monitored with Datadog. Few examples: - Cloud Instance, Virtual Machine - Bare metal node - Platform or service specific nodes like Kubernetes node Basically any device or location that has Datadog agent installed and running on.
What is a Datadog agent?
A software runs on a Datadog host. Its purpose is to collect data from the host and sent it to Datadog (data like metrics, logs, etc.)
What are Datadog tags?
Datadog tags are used to mark different information with unique properties. For example, you might want to tag some data with "environment: production" while tagging information from staging or dev environment with "environment: staging".
## Datadog Agent
What are the component of a Datadog agent?
* Collector: its role is to collect data from the host on which it's installed. The default period of time as of today is every 15 seconds. * Forwarder: responsible for sending the data to Datadog over HTTPS
## Datadog Integrations
What can you tell about Datadog integrations?
- Datadog has many supported integrations with different services, platforms, etc. - Each integration includes information on how to apply it, how to use it and what configuration options it supports
What opening some of the integrations windows/pages, there is a ection called "Monitors". What can be found there?
Usually you can find there some anomaly types that Datadog suggests to monitor and track.
================================================ FILE: topics/devops/README.md ================================================ # DevOps ## Questions ### General
What is DevOps?
The definition of DevOps from selected companies: **Amazon**: "DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market." **Microsoft**: "DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. The contraction of “Dev” and “Ops” refers to replacing siloed Development and Operations to create multidisciplinary teams that now work together with shared and efficient practices and tools. Essential DevOps practices include agile planning, continuous integration, continuous delivery, and monitoring of applications." **Red Hat**: "DevOps describes approaches to speeding up the processes by which an idea (like a new software feature, a request for enhancement, or a bug fix) goes from development to deployment in a production environment where it can provide value to the user. These approaches require that development teams and operations teams communicate frequently and approach their work with empathy for their teammates. Scalability and flexible provisioning are also necessary. With DevOps, those that need power the most, get it—through self service and automation. Developers, usually coding in a standard development environment, work closely with IT operations to speed software builds, tests, and releases—without sacrificing reliability." **Google**: "...The organizational and cultural movement that aims to increase software delivery velocity, improve service reliability, and build shared ownership among software stakeholders"
What are the benefits of DevOps? What can it help us to achieve?
* Collaboration * Improved delivery * Security * Speed * Scale * Reliability
What are the anti-patterns of DevOps?
A couple of examples: * One person is in charge of specific tasks. For example there is only one person who is allowed to merge the code of everyone else into the repository. * Treating production differently from development environment. For example, not implementing security in development environment * Not allowing someone to push to production on Friday ;)
How would you describe a successful DevOps engineer or a team?
The answer can focus on: * Collaboration * Communication * Set up and improve workflows and processes (related to testing, delivery, ...) * Dealing with issues Things to think about: * What DevOps teams or engineers should NOT focus on or do? * Do DevOps teams or engineers have to be innovative or practice innovation as part of their role?
One of your team members suggests to set a goal of "deploying at least 20 times a day" in regards to CD. What is your take on that?
A couple of thoughts: 1. Why is it an important goal? Is it affecting the business somehow? One of the KPIs? In other words, does it matters? 2. This might introduce risks such as losing quality in favor of quantity 3. You might want to set a possibly better goal such as "be able to deploy whenever we need to deploy"
### Tooling
What do you take into consideration when choosing a tool/technology?
A few ideas to think about: * mature/stable vs. cutting edge * community size * architecture aspects - agent vs. agentless, master vs. masterless, etc. * learning curve
Can you describe which tool or platform you chose to use in some of the following areas and how? * CI/CD * Provisioning infrastructure * Configuration Management * Monitoring & alerting * Logging * Code review * Code coverage * Issue Tracking * Containers and Containers Orchestration * Tests
This is a more practical version of the previous question where you might be asked additional specific questions on the technology you chose * CI/CD - Jenkins, Circle CI, Travis, Drone, Argo CD, Zuul * Provisioning infrastructure - Terraform, CloudFormation * Configuration Management - Ansible, Puppet, Chef * Monitoring & alerting - Prometheus, Nagios * Logging - Logstash, Graylog, Fluentd * Code review - Gerrit, Review Board * Code coverage - Cobertura, Clover, JaCoCo * Issue tracking - Jira, Bugzilla * Containers and Containers Orchestration - Docker, Podman, Kubernetes, Nomad * Tests - Robot, Serenity, Gauge
A team member of yours, suggests to replace the current CI/CD platform used by the organization with a new one. How would you reply?
Things to think about: * What we gain from doing so? Are there new features in the new platform? Does the new platform deals with some of the limitations presented in the current platform? * What this suggestion is based on? In other words, did he/she tried out the new platform? Was there extensive technical research? * What does the switch from one platform to another will require from the organization? For example, training users who use the platform? How much time the team has to invest in such move?
### Version Control
What is Version Control?
* Version control is the system of tracking and managing changes to software code. * It helps software teams to manage changes to source code over time. * Version control also helps developers move faster and allows software teams to preserve efficiency and agility as the team scales to include more developers.
What is a commit?
* In Git, a commit is a snapshot of your repo at a specific point in time. * The git commit command will save all staged changes, along with a brief description from the user, in a “commit” to the local repository.
What is a merge?
* Merging is Git's way of putting a forked history back together again. The git merge command lets you take the independent lines of development created by git branch and integrate them into a single branch.
What is a merge conflict?
* A merge conflict is an event that occurs when Git is unable to automatically resolve differences in code between two commits. When all the changes in the code occur on different lines or in different files, Git will successfully merge commits without your help.
What best practices are you familiar with regarding version control?
* Use a descriptive commit message * Make each commit a logical unit * Incorporate others' changes frequently * Share your changes frequently * Coordinate with your co-workers * Don't commit generated files * Don't commit binary files
Would you prefer a "configuration->deployment" model or "deployment->configuration"? Why?
Both have advantages and disadvantages. With "configuration->deployment" model for example, where you build one image to be used by multiple deployments, there is less chance of deployments being different from one another, so it has a clear advantage of a consistent environment.
Explain mutable vs. immutable infrastructure
In mutable infrastructure paradigm, changes are applied on top of the existing infrastructure and over time the infrastructure builds up a history of changes. Ansible, Puppet and Chef are examples of tools which follow mutable infrastructure paradigm. In immutable infrastructure paradigm, every change is actually a new infrastructure. So a change to a server will result in a new server instead of updating it. Terraform is an example of technology which follows the immutable infrastructure paradigm.
### Software Distribution
Explain "Software Distribution"
Read [this](https://venam.nixers.net/blog/unix/2020/03/29/distro-pkgs.html) fantastic article on the topic. From the article: "Thus, software distribution is about the mechanism and the community that takes the burden and decisions to build an assemblage of coherent software that can be shipped."
Why are there multiple software distributions? What differences they can have?
Different distributions can focus on different things like: focus on different environments (server vs. mobile vs. desktop), support specific hardware, specialize in different domains (security, multimedia, ...), etc. Basically, different aspects of the software and what it supports, get different priority in each distribution.
What is a Software Repository?
Wikipedia: "A software repository, or “repo” for short, is a storage location for software packages. Often a table of contents is stored, as well as metadata." Read more [here](https://en.wikipedia.org/wiki/Software_repository)
What ways are there to distribute software? What are the advantages and disadvantages of each method?
* Source - Maintain build script within version control system so that user can build your app after cloning repository. Advantage: User can quickly checkout different versions of application. Disadvantage: requires build tools installed on users machine. * Archive - collect all your app files into one archive (e.g. tar) and deliver it to the user. Advantage: User gets everything he needs in one file. Disadvantage: Requires repeating the same procedure when updating, not good if there are a lot of dependencies. * Package - depends on the OS, you can use your OS package format (e.g. in RHEL/Fefodra it's RPM) to deliver your software with a way to install, uninstall and update it using the standard packager commands. Advantages: Package manager takes care of support for installation, uninstallation, updating and dependency management. Disadvantage: Requires managing package repository. * Images - Either VM or container images where your package is included with everything it needs in order to run successfully. Advantage: everything is preinstalled, it has high degree of environment isolation. Disadvantage: Requires knowledge of building and optimizing images.
Are you familiar with "The Cathedral and the Bazaar models"? Explain each of the models
* Cathedral - source code released when software is released * Bazaar - source code is always available publicly (e.g. Linux Kernel)
What is caching? How does it work? Why is it important?
Caching is fast access to frequently used resources which are computationally expensive or IO intensive and do not change often. There can be several layers of cache that can start from CPU caches to distributed cache systems. Common ones are in memory caching and distributed caching.
Caches are typically data structures that contains some data, such as a hashtable or dictionary. However, any data structure can provide caching capabilities, like set, sorted set, sorted dictionary etc. While, caching is used in many applications, they can create subtle bugs if not implemented correctly or used correctly. For example,cache invalidation, expiration or updating is usually quite challenging and hard.
Explain stateless vs. stateful
Stateless applications don't store any data in the host which makes it ideal for horizontal scaling and microservices. Stateful applications depend on the storage to save state and data, typically databases are stateful applications.
What is Reliability? How does it fit DevOps?
Reliability, when used in DevOps context, is the ability of a system to recover from infrastructure failure or disruption. Part of it is also being able to scale based on your organization or team demands.
What does "Availability" mean? What means are there to track Availability of a service?
Why isn't 100% availability a target? Why do most companies or teams set it to be 99%.X?
Describe the workflow of setting up some type of web server (Apache, IIS, Tomcat, ...)
How does a web server work?
According to MDN Web Docs - We can understand web servers using two view points, which is: (i) Hardware (ii) Software (i) A web server is nothing but a remote computer which stores website's component files(HTML,CSS and Javascript files) and web server's software.A web server connects to the Internet and supports physical data interchange with other devices connected to the web. (ii) On the software side, a web server includes several parts that control how web users access hosted files. At a minimum, this is an HTTP server. An HTTP server is software that understands URLs (web addresses) and HTTP (the protocol your browser uses to view webpages). An HTTP server can be accessed through the domain names of the websites it stores, and it delivers the content of these hosted websites to the end user's device. ## How communication between web server and web browsers established: Whenever a browser needs a file that is hosted on a web server, the browser requests the page from the web server and the web server responds with that page. This communication between web browser and web server happens in the following ways: (1) User enters the domain name in the browser,and the browser then search for the IP address of the entered name. It can be done in 2 ways- -By searching in its cache. -By requesting one or more DNS (Domain Name System) Servers. (2) After knowing the IP Address, the browser requests the file via HTTP and the request reaches the correct (hardware) web server. (3) The (software) HTTP server accepts the request, finds the requested document, and sends it back to the browser, also through HTTP. (If the server doesn't find the requested document, it returns a 404 response instead.) (4) The Browser finally gets the webpages and displays it, or displays the error message.
Explain "Open Source"
Describe the architecture of service/app/project/... you designed and/or implemented
What types of tests are you familiar with?
Styling, unit, functional, API, integration, smoke, scenario, ... You should be able to explain those that you mention.
You need to install periodically a package (unless it's already exists) on different operating systems (Ubuntu, RHEL, ...). How would you do it?
There are multiple ways to answer this question (there is no right and wrong here): * Simple cron job * Pipeline with configuration management technology (such Puppet, Ansible, Chef, etc.) ...
What is Chaos Engineering?
Wikipedia: "Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions" Read about Chaos Engineering [here](https://en.wikipedia.org/wiki/Chaos_engineering)
What is "infrastructure as code"? What implementation of IAC are you familiar with?
IAC (infrastructure as code) is a declarative approach of defining infrastructure or architecture of a system. Some implementations are ARM templates for Azure and Terraform that can work across multiple cloud providers.
What benefits does infrastructure-as-code have?
- fully automated process of provisioning, modifying and deleting your infrastructure - version control for your infrastructure which allows you to quickly rollback to previous versions - validate infrastructure quality and stability with automated tests and code reviews - makes infrastructure tasks less repetitive
How do you manage build artifacts?
Build artifacts are usually stored in a repository. They can be used in release pipelines for deployment purposes. Usually there is retention period on the build artifacts.
What Continuous Integration solution are you using/prefer and why?
What deployment strategies are you familiar with or have used?
There are several deployment strategies: * Rolling * Blue green deployment * Canary releases * Recreate strategy
You joined a team where everyone developing one project and the practice is to run tests locally on their workstation and push it to the repository if the tests passed. What is the problem with the process as it is now and how to improve it?
Explain test-driven development (TDD)
Explain agile software development
What do you think about the following sentence?: "Implementing or practicing DevOps leads to more secure software"
Do you know what is a "post-mortem meeting"? What is your opinion on that?
What is a configuration drift? What problems is it causing?
Configuration drift happens when in an environment of servers with the exact same configuration and software, a certain server or servers are being applied with updates or configuration which other servers don't get and over time these servers become slightly different than all others. This situation might lead to bugs which hard to identify and reproduce.
How to deal with a configuration drift?
Configuration drift can be avoided with desired state configuration (DSC) implementation. Desired state configuration can be a declarative file that defined how a system should be. There are tools to enforce desired state such a terraform or azure dsc. There are incremental or complete strategies.
Explain Declarative and Procedural styles. The technologies you are familiar with (or using) are using procedural or declarative style?
Declarative - You write code that specifies the desired end state
Procedural - You describe the steps to get to the desired end state Declarative Tools - Terraform, Puppet, CloudFormation, Ansible
Procedural Tools - Chef To better emphasize the difference, consider creating two virtual instances/servers. In declarative style, you would specify two servers and the tool will figure out how to reach that state. In procedural style, you need to specify the steps to reach the end state of two instances/servers - for example, create a loop and in each iteration of the loop create one instance (running the loop twice of course).
Do you have experience with testing cross-projects changes? (aka cross-dependency)
Note: cross-dependency is when you have two or more changes to separate projects and you would like to test them in mutual build instead of testing each change separately.
Have you contributed to an open source project? Tell me about this experience
What is Distributed Tracing?
### GitOps
What is GitOps?
GitLab: "GitOps is an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD tooling, and applies them to infrastructure automation". Read more [here](https://about.gitlab.com/topics/gitops)
What are some of the advantages of applying GitOps?
* It introduces limited/granular access to infrastructure * It makes it easier to trace who makes changes to infrastructure
When a repository refereed to as "GitOps Repository" what does it means?
A repository that doesn't holds the application source code, but the configuration, infra, ... files that required to test and deploy the application.
What are some practical implementations or practices of GitOp?
* Store Infra files in a version control repository (like Git) * Apply review/approval process for changes
Two engineers in your team argue on where to put the configuration and infra related files of a certain application. One of them suggests to put it in the same repo as the application repository and the other one suggests to put to put it in its own separate repository. What's your take on that?
One might say we need more details as to what these configuration and infra files look like exactly and how complex the application and its CI/CD pipeline(s), but in general, most of the time you will want to put configuration and infra related files in their own separate repository and not in the repository of the application for multiple reasons: * Every change submitted to the configuration, shouldn't trigger the CI/CD of the application, it should be testing out and applying the modified configuration, not the application itself * When you mix application code with configuration and infra related files
#### SRE
What are the differences between SRE and DevOps?
Google: "One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel." Read more about it [here](https://sre.google/sre-book/introduction)
What SRE team is responsible for?
Google: "the SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services" Read more about it [here](https://sre.google/sre-book/introduction)
What is an error budget?
Atlassian: "An error budget is the maximum amount of time that a technical system can fail without contractual consequences." Read more about it [here](https://www.atlassian.com/incident-management/kpis/error-budget)
What do you think about the following statement: "100% is the only right availability target for a system"
Wrong. No system can guarantee 100% availability as no system is safe from experiencing zero downtime. Many systems and services will fall somewhere between 99% and 100% uptime (or at least this is how most systems and services should be).
What are MTTF (mean time to failure) and MTTR (mean time to repair)? What these metrics help us to evaluate?
* MTTF (mean time to failure) other known as uptime, can be defined as how long the system runs before if fails. * MTTR (mean time to recover) on the other hand, is the amount of time it takes to repair a broken system. * MTBF (mean time between failures) is the amount of time between failures of the system.
What is the role of monitoring in SRE?
Google: "Monitoring is one of the primary means by which service owners keep track of a system’s health and availability" Read more about it [here](https://sre.google/sre-book/introduction)
What are the two main SRE KPIs
Service Level Indicators (SLI) and Service Level Objectives (SLO).
What is Toil?
Google: Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows Read more about it [here](https://sre.google/sre-book/eliminating-toil/)
What is a postmortem ?
The postmortem is a process that should take place following an incident. It’s purpose is to identify the root cause of an incident and the actions that should be taken to avoid this kind of incidents from happening again.
What is the core value often put forward when talking about postmortem?
Blamelessness. Postmortems need to be blameless and this value should be remided at the beginning of every postmortem. This is the best way to ensure that people are playing the game to find the root cause and not trying to hide their possible faults.
================================================ FILE: topics/devops/containerize_app.md ================================================ ## Containerize an Application 1. Clone an open source project you would like to containerize. A couple of suggestions: ``` https://github.com/bregman-arie/node-hello-world https://github.com/bregman-arie/flask-hello-world ``` 2. Write a Dockerfile you'll use for building an image of the application (you can use any base image you would like) 3. Build an image using the Dockerfile you've just wrote 4. Verify the image exists 5. [Optional] Push the image you've just built to a registry 6. Run the application 7. Verify the app is running ================================================ FILE: topics/devops/ha_hello_world.md ================================================ ## Highly Available "Hello World" Set up an highly available "Hello World" application with the following instructions: * Use a containerized Load Balancer * Provision two virtual machines (this is where the app will run) * The page, when visited, should show "Hello World! I'm host X" - X should be the name of the virtual machine ================================================ FILE: topics/devops/solutions/containerize_app.md ================================================ ## Containerize an Application 1. Clone an open source project you would like to containerize. A couple of suggestions: ``` https://github.com/bregman-arie/node-hello-world https://github.com/bregman-arie/flask-hello-world ``` `git clone https://github.com/bregman-arie/node-hello-world` 2. Write a Dockerfile you'll use for building an image of the application (you can use any base image you would like) ``` FROM alpine LABEL maintainer="your name/email" RUN apk add --update nodejs npm COPY . /src WORKDIR /src RUN npm install EXPOSE 3000 ENTRYPOINT ["node", "./app.js"] ``` 3. Build an image using the Dockerfile you've just wrote `docker image build -t web_app:latest .` 4. Verify the image exists `docker image ls` 5. [Optional] Push the image you've just built to a registry ``` docker login docker image tag web_app:latest /web_app:latest # Verify with "docker image ls" docker image push /web_app:latest ``` 6. Run the application ``` docker container run -d -p 80:3000 web_app:latest ``` 7. Verify the app is running ``` docker container ls docker logs # In the browser, go to 127.0.0.1:80 ``` ================================================ FILE: topics/devops/solutions/ha_hello_world.md ================================================ ## Highly Available "Hello World" Set up an highly available "Hello World" application with the following instructions: * Use a containerized Load Balancer * Provision two virtual machines (this is where the app will run) * The page, when visited, should show "Hello World! I'm host X" - X should be the name of the virtual machine ### Solution 1. Provision two VMs ================================================ FILE: topics/dns/README.md ================================================ ## DNS
What is DNS? What is it used for?
DNS (Domain Name Systems) is a protocol used for converting domain names into IP addresses.
computer networking (at layer 3 of the OSP model) is done with IP addresses but for as humans it's hard to remember IP addresses, it's much easier to remember names. This why we need something such as DNS to convert any domain name we type into an IP address. You can think on DNS as a huge phonebook or database where each corresponding name has an IP.
What is DNS resolution?
The process of translating IP addresses to domain names.
What is a name server?
A server which is responsible for resolving DNS queries.
What is the resolution sequence of: www.site.com
It's resolved in this order: 1) . 2) .com 3) site.com 4) www.site.com
What is a domain name registrar?
[Cloudflare](https://www.cloudflare.com/en-gb/learning/dns/glossary/what-is-a-domain-name-registrar): "A domain name registrar provides domain name registrations to the general public. A common misconception is that registrars sell domain names; these domain names are actually owned by registries and can only be leased by users."
Given the following fqdn, www.blipblop.com, what is the root?
`.` is the root
Given the following fqdn, www.blipblop.com, what is the top level domain?
`.com.` is the top level domain
Given the following fqdn, www.blipblop.com, what is the second level domain?
`blipblop.com.` is the second level domain
Given the following fqdn, www.blipblop.com, what is the domain?
`www.blipblop.com.` is the domain
Describe DNS resolution workflow in high-level
In general the process is as follows: * The user types an address in the web browser (some_site.com) * The operating system gets a request from the browser to translate the address the user entered * A query created to check if a local entry of the address exists in the system. In case it doesn't, the request is forwarded to the DNS resolver * The Resolver is a server, usually configured by your ISP when you connect to the internet, that responsible for resolving your query by contacting other DNS servers * The Resolver contacts the root nameserver (aka as .) * The root nameserver either responds with the address you are looking for or it responds with the address of the relevant Top Level Domain DNS server (if your address ends with org then the org TLD) * The Resolver then contacts the TLD DNS. TLD DNS might respond with the address you are looking for. If it doesn't has the information, it will provide the address of SLD DNS server * SLD DNS server will reply with the address to the resolver * The Resolver passes this information to the browser while your OS also stores this information in the cache * The user cab browse the website with happiness and joy :D
##### DNS - Records
What is a DNS record?
A mapping between domain name and an IP address.
What types of DNS records are there?
* A * CNAME * PTR * MX * AAAA ... A more detailed list, can be found [here](https://www.nslookup.io/learning/dns-record-types)
What is a A record?
A (Address): Maps a host name to an IPv4 address. When a computer has multiple adapter cards and IP addresses, it should have multiple address records.
What is a AAAA record?
An AAAA Record performs the same function as an A Record, but for an IPv6 Address.
What is a CNAME record?
CNAME: maps a hostname to another hostname. The target should be a domain name which must have an A or AAAA record. Think of it as an alias record.
What is a PTR record?
While an A record points a domain name to an IP address, a PTR record does the opposite and resolves the IP address to a domain name.
What is a MX record?
MX (Mail Exchange) Specifies a mail exchange server for the domain, which allows mail to be delivered to the correct mail servers in the domain.
What is a NS record?
NS: name servers that can respond to DNS queries
##### DNS - TTL
Explain DNS Records TTL
[varonis.com](https://www.varonis.com/blog/dns-ttl): "DNS TTL (time to live) is a setting that tells the DNS resolver how long to cache a query before requesting a new one. The information gathered is then stored in the cache of the recursive or local resolver for the TTL before it reaches back out to collect new, updated details."
##### DNS - Misc
Is DNS using TCP or UDP?
DNS uses UDP port 53 for resolving queries either regular or reverse. DNS uses TCP for zone transfer.
True or False? DNS can be used for load balancing
True.
Which techniques a DNS can use for load balancing?
There are several techniques that a DNS can use for load balancing, including: * Round-robin DNS * Weighted round-robin DNS * Least connections * GeoDNS
What is a DNS zone?
A DNS zone is a logical container that holds all the DNS resource records for a specific domain name.
What types of zones are there?
There are several types, including: * Primary zone: A primary zone is a read/write zone that is stored in a master DNS server. * Secondary zone: A secondary zone is a read-only copy of a primary zone that is stored in a slave DNS server. * Stub zone: A stub zone is a type of zone that contains only the essential information about a domain name. It is used to reduce the amount of DNS traffic and improve the efficiency of the DNS resolution process.
================================================ FILE: topics/eflk.md ================================================ ## ELK + Filebeat Set up the following using any log you would like: * Run the following: elasticsearch, logstash, kibana and filebeat (each running in its own container) * Make filebeat transfer a log to logstash for process * Once logstash is done, index with elasticsearch * Finally, make sure data is available in Kibana ================================================ FILE: topics/flask_container_ci/README.md ================================================ Your mission, should you choose to accept it, involves fixing the app in this directory, containerize it and set up a CI for it. Please read carefully all the instructions. If any of the following steps is not working, it is expected from you to fix them ## Installation 1. Create a virtual environment with `python3 -m venv challenge_venv` 2. Activate it with `source challenge_venv/bin/activate` 3. Install the requirements in this directory `pip install -r requirements.txt` ## Run the app 1. Move to `challenges/flask_container_ci` directory, if you are not already there 1. Run `export FLASK_APP=app/main.py` 1. To run the app execute `flask run`. If it doesn't work, fix it 3. Access `http://127.0.0.1:5000`. You should see the following: ``` { "resources_uris": { "user": "/users/\", "users": "/users" }, "current_uri": "/" } ``` 4. You should be able to access any of the resources and get the following data: * /users - all users data * /users/ - data on the specific chosen user 5. When accessing /users, the data returned should not include the id of the user, only its name and description. Also, the data should be ordered by usernames. ## Containers Using Docker or Podman, containerize the flask app so users can run the following two commands: ``` docker build -t app:latest /path/to/Dockerfile docker run -d -p 5000:5000 app ``` 1. You can use any image base you would like 2. Containerize only what you need for running the application, nothing else. ## CI Great, now that we have a working app and also can run it in a container, let's set up a CI for it so it won't break again in the future In current directory you have a file called tests.py which includes the tests for the app. What is required from you, is: 1. The CI should run the app tests. You are free to choose whatever CI system or service you prefer. Use `python tests.py` for running the tests. 2. There should be some kind of test for the Dockerfile you wrote 3. Add additional unit test (or another level of tests) for testing the app ### Guidelines * Except the app functionality, you can change whatever you want - structure, tooling, libraries, ... If possible add `notes.md` file which explains reasons, logic, thoughts and anything else you would like to share * The CI part should include the source code for the pipeline definition ================================================ FILE: topics/flask_container_ci/app/__init__.py ================================================ #!/usr/bin/env python # coding=utf-8 ================================================ FILE: topics/flask_container_ci/app/config.py ================================================ #!/usr/bin/env python # coding=utf-8 import os basedir = os.path.abspath(os.path.dirname(__file__)) SECRET_KEY = 'shhh' CSRF_ENABLED = True SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(basedir, 'app.db') ================================================ FILE: topics/flask_container_ci/app/main.py ================================================ #!/usr/bin/env python # coding=utf-8 from flask import Flask from flask import make_response import json from flask_wtf.csrf import CSRFProtect from werkzeug.exceptions import NotFound # OpenRefactory Warning: The 'Flask' method creates a Flask app # without Cross-Site Request Forgery (CSRF) protection. app = Flask(__name__) CSRFProtect(app) with open("./users.json", "r") as f: users = json.load(f) @app.route("/", methods=['GET']) def index(): return pretty_json({ "resources": { "users": "/users", "user": "/users/", }, "current_uri": "/" }) @app.route("/users", methods=['GET']) def all_users(): return pretty_json(users) @app.route("/users/", methods=['GET']) def user_data(username): if username not in users: raise NotFound return pretty_json(users[username]) @app.route("/users//something", methods=['GET']) def user_something(username): raise NotImplementedError() def pretty_json(arg): response = make_response(json.dumps(arg, sort_keys=True, indent=4)) response.headers['Content-type'] = "application/json" return response def create_test_app(): # OpenRefactory Warning: The 'Flask' method creates a Flask app # without Cross-Site Request Forgery (CSRF) protection. app = Flask(__name__) CSRFProtect(app) return app if __name__ == "__main__": app.run(port=5000) ================================================ FILE: topics/flask_container_ci/app/tests.py ================================================ #!/usr/bin/env python # coding=utf-8 import os import unittest from config import basedir from app import app from app import db class TestCase(unittest.TestCase): def setUp(self): app.config['TESTING'] = True app.config['WTF_CSRF_ENABLED'] = False app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join( basedir, 'test.db') self.app = app.test_client() db.create_all() def tearDown(self): db.session.remove() db.drop_all() if __name__ == '__main__': unittest.main() ================================================ FILE: topics/flask_container_ci/requirements.txt ================================================ flask ================================================ FILE: topics/flask_container_ci/tests.py ================================================ #!/usr/bin/env python # coding=utf-8 import unittest from app import main class TestCase(unittest.TestCase): def setUp(self): self.app = main.app.test_client() def test_main_page(self): response = self.app.get('/', follow_redirects=True) self.assertEqual(response.status_code, 200) def test_users_page(self): response = self.app.get('/users', follow_redirects=True) self.assertEqual(response.status_code, 200) if __name__ == '__main__': unittest.main() ================================================ FILE: topics/flask_container_ci/users.json ================================================ { "geralt" : { "id": "whitewolf", "name": "Geralt of Rivia", "description": "Traveling monster slayer for hire" }, "lara_croft" : { "id": "m31a3n6sion", "name": "Lara Croft", "description": "Highly intelligent and athletic English archaeologist" }, "mario" : { "id": "smb3igiul", "name": "Mario", "description": "Italian plumber who really likes mushrooms" }, "gordon_freeman" : { "id": "nohalflife3", "name": "Gordon Freeman", "description": "Physicist with great shooting skills" } } ================================================ FILE: topics/flask_container_ci2/README.md ================================================ Your mission, should you choose to accept it, involves developing an app, containerize it and set up a CI for it. Please read carefully all the instructions. If any of the following steps is not working, it is expected from you to fix them ## Installation 1. Create a virtual environment with `python3 -m venv challenge_venv` 2. Activate it with `source challenge_venv/bin/activate` 3. Install the requirements in this directory `pip install -r requirements.txt` ## Run the app 1. Move to `challenges/flask_container_ci` directory, if you are not already there 1. Run `export FLASK_APP=app/main.py` 1. To run the app execute `flask run`. If it doesn't works, fix it 3. Access `http://127.0.0.1:5000`. You should see the following ``` { "current_uri": "/", "example": "/matrix/'123n456n789'", "resources": { "column": "/columns//", "matrix": "/matrix/", "row": "/rows//" } } ``` 4. You should be able to access any of the resources and get the following data: * /matrix/\ for example, for /matrix/123n456n789 the user will get: 1 2 3 4 5 6 7 8 9 * /matrix/\/\ for example, for /matrix/123n456n789/2 the user will get: 2 5 8 * /matrix/\/\ for example, for /matrix/123n456n789/1 the user will get: 1 2 3 ## Containers Using Docker or Podman, containerize the flask app so users can run the following two commands: ``` docker build -t app:latest /path/to/Dockerfile docker run -d -p 5000:5000 app ``` 1. You can use any image base you would like 2. Containerize only what you need for running the application, nothing else. ## CI Great, now that we have a working app and also can run it in a container, let's set up a CI for it so it won't break again in the future In current directory you have a file called tests.py which includes the tests for the app. What is required from you, is: 1. Write a CI pipeline that will run the app tests. You are free to choose whatever CI system or service you prefer. Use `python tests.py` for running the tests. 2. There should be some kind of test for the Dockerfile you wrote 3. Add additional unit test (or any other level of tests) for testing the app ### Guidelines * Except the app functionality, you can change whatever you want - structure, tooling, libraries, ... If possible, add `notes.md` file which explains reasons, logic, thoughts and anything else you would like to share * The CI part should include the source code for the pipeline definition ================================================ FILE: topics/flask_container_ci2/app/__init__.py ================================================ #!/usr/bin/env python # coding=utf-8 ================================================ FILE: topics/flask_container_ci2/app/config.py ================================================ #!/usr/bin/env python # coding=utf-8 import os basedir = os.path.abspath(os.path.dirname(__file__)) SECRET_KEY = 'shhh' CSRF_ENABLED = True SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(basedir, 'app.db') ================================================ FILE: topics/flask_container_ci2/app/main.py ================================================ #!/usr/bin/env python # coding=utf-8 from flask import Flask from flask import make_response import json from flask_wtf.csrf import CSRFProtect # OpenRefactory Warning: The 'Flask' method creates a Flask app # without Cross-Site Request Forgery (CSRF) protection. app = Flask(__name__) CSRFProtect(app) @app.routee("/", methods=['GET']) def index(): return pretty_json({ "resources": { "matrix": "/matrix/", "column": "/columns//", "row": "/rows//", }, "current_uri": "/", "example": "/matrix/'123n456n789'", }) @app.route("/matrix/", methods=['GET']) def matrix(matrix): # TODO: return matrix, each row in a new line pass @app.route("/matrix//", methods=['GET']) def column(matrix, column_number): # TODO: return column based on given column number pass @app.route("/matrix//", methods=['GET']) def row(matrix, row_number): # TODO: return row based on given row number pass def pretty_json(arg): response = make_response(json.dumps(arg, sort_keys=True, indent=4)) response.headers['Content-type'] = "application/json" return response if __name__ == "__main__": app.run(port=5000) ================================================ FILE: topics/flask_container_ci2/app/tests.py ================================================ #!/usr/bin/env python # coding=utf-8 import os import unittest from config import basedir from app import app from app import db class TestCase(unittest.TestCase): def setUp(self): app.config['TESTING'] = True app.config['WTF_CSRF_ENABLED'] = False app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join( basedir, 'test.db') self.app = app.test_client() db.create_all() def tearDown(self): db.session.remove() db.drop_all() if __name__ == '__main__': unittest.main() ================================================ FILE: topics/flask_container_ci2/requirements.txt ================================================ flask ================================================ FILE: topics/flask_container_ci2/tests.py ================================================ #!/usr/bin/env python # coding=utf-8 import unittest from app import main class TestCase(unittest.TestCase): def setUp(self): self.app = main.app.test_client() def test_main_page(self): response = self.app.get('/', follow_redirects=True) self.assertEqual(response.status_code, 200) def test_matrix(self): response = self.app.get('/matrix/123n459,789', follow_redirects=True) # Change when the matrix route is fixed and returning the actual matrix self.assertEqual(response.status_code, 500) if __name__ == '__main__': unittest.main() ================================================ FILE: topics/gcp/README.md ================================================ # Google Cloud Platform - [Google Cloud Platform](#google-cloud-platform) - [Exercises](#exercises) - [Account Setup](#account-setup) - [Compute Engine](#compute-engine) - [Questions](#questions) - [Global Infrastructure](#global-infrastructure) - [gcloud](#gcloud) - [Resource Hierarchy](#resource-hierarchy) - [IAM and Roles](#iam-and-roles) - [Labels and Tags](#labels-and-tags) - [gcloud](#gcloud-1) - [Compute Engine](#compute-engine-1) - [gcloud](#gcloud-2) - [Other](#other) - [Google Kubernetes Engine (GKE)](#google-kubernetes-engine-gke) - [Anthos](#anthos) ## Exercises ### Account Setup |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Create a project | Organization | [Exercise](exercises/create_project/exercise.md) | [Solution](exercises/create_project/solution.md) | | | Assign roles | IAM | [Exercise](exercises/assign_roles/exercise.md) | [Solution](exercises/assign_roles/solution.md) | | ### Compute Engine |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Create an instance | Compute, Labels | [Exercise](exercises/instance_101/exercise.md) | [Solution](exercises/instance_101/solution.md) | | ## Questions ### Global Infrastructure
Explain each of the following * Zone * Region
GCP regions are data centers hosted across different geographical locations worldwide.
Within each region, there are multiple isolated locations known as Zones. Each zone is one or more data-centers with redundant network and connectivity and power supply. Multiple zones ensure high availability in case one of them goes down
True or False? Each GCP region is designed to be completely isolated from the other GCP regions
True.
What considerations to take when choosing an GCP region for running a new application?
* Services Availability: not all service (and all their features) are available in every region * Reduced latency: deploy application in a region that is close to customers * Compliance: some countries have more strict rules and requirements such as making sure the data stays within the borders of the country or the region. In that case, only specific region can be used for running the application * Pricing: the pricing might not be consistent across regions so, the price for the same service in different regions might be different.
True or False? All GCP services are available in all regions zones
False. You can see [here](https://cloud.google.com/about/locations) which products/services available in each region.
#### gcloud
How to list all regions?
`gcloud compute regions list`
### Resource Hierarchy
Explain resources hierarchy in GCP
Organization Folder Project Resources * Organizations - Company * Folder - usually for departments, teams, products, etc. * Project - can be different projects or same project but different environments (dev, staging, production) * Resources - actual GCP services (Compute, App engine, Storage, etc.)
True or False? In a project, you can have one or more organizations
False. It's quite the opposite. First there is an organization and under organization you can have one or more folder with one or more projects.
True or False? A resource has to be associated with at least one project
True. You can't have resources associate with no projects.
True or False? Project name has to be globally unique
True.
### IAM and Roles
Explain roles and permissions
Role is an encapsulation of set of permissions. For example an "owner" role has more than 3000 assigned permissions to the different components and services of GCP.
True or False? Permissive parent policy will always overrule restrictive child policy
True
### Labels and Tags
What are labels?
You can think about labels in GCP as sticky notes that you attach to different GCP resources. That makes it easier for example, to search for specific resources (like applying the label called "web-app" and search for all the resources that are related somehow to "web-app")
Can you provide some examples to labels usage in GCP?
* Location (cost center) * Project (or environment, folder, etc.) * Service type * Service owner * Application type * Application owner
What are network tags and how are they different from labels?
As the name suggests, network tags can be applied only to network resources. While labels don't affect the resources on which they are applied, network tags do affect resources (e.g. firewall access and networking routes)
#### gcloud
List the labels of an instance called "instance-1"
`gcloud compute instances describe instance-1 --format "yaml(labels)"`
Update a label to "app=db" for the instance called "instance-1"
`gcloud compute instances update instance-1 --update-labels app=db`
Remove the label "env" from an instance called "instance-1"
`gcloud compute instances update instance-1 --remove-labels env`
### Compute Engine #### gcloud
Create an instance with the following properties: * name: instance-1 * machine type: e2-micro * labels: app=web, env=dev
`gcloud compute instances create instance-1 --labels app=web,env=dev --machine-type=e2-micro`
### Other
Tell me what do you know about GCP networking
Virtual Private Cloud(VPC) network is a virtual version of physical network, implemented in Google's internal Network. VPC is a global resource in GCP. Subnetworks(subnets) are regional resources, ie., subnets can be created withinin regions. VPC are created in 2 modes, 1. Auto mode VPC - One subnet in each region is created automatically by GCP while creating VPC 2. Custom mode VPC - No subnets are automatically created. This type of network provides complete control over the subnets creation to the users.
Explain Cloud Functions
Google Cloud Functions is a serverless execution environment for building and connecting cloud services. With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your function is triggered when an event being watched is fired.
What is Cloud Datastore?
Cloud Datastore is a schemaless NoSQL datastore in Google's cloud. Applications can use Datastore to query your data with SQL-like queries that support filtering and sorting. Datastore replicates data across multiple datacenters, which provides a high level of read/write availability.
What network tags are used for?
Network tags allow you to apply firewall rules and routes to a specific instance or set of instances: You make a firewall rule applicable to specific instances by using target tags and source tags.
What are flow logs? Where are they enabled?
VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as Google Kubernetes Engine nodes. These logs can be used for network monitoring, forensics, real-time security analysis, and expense optimization. Enable Flow Logs 1. Open VPC Network in GCP Console 2. Click the name of the subnet 3. Click EDIT button 4. Set Flow Logs to On 5. Click Save
How do you list buckets?
Two ways to do that: $ gsutil ls $ gcloud alpha storage ls
What Compute metadata key allows you to run code at startup?
startap-script
What the following commands does? `gcloud deployment-manager deployments create`
Deployment Manager creates a new deployment.
What is Cloud Code?
It is a set of tools to help developers write, run and debug GCP kubernetes based applications. It provides built-in support for rapid iteration, debugging and running applications in development and production K8s environments.
### Google Kubernetes Engine (GKE)
What is GKE
* It is the managed kubernetes service on GCP for deploying, managing and scaling containerised applications using Google infrastructure.
### Anthos
What is Anthos
It is a managed application platform for organisations like enterprises that require quick modernisation and certain levels of consistency for their legacy applications in a hybrid or multicloud world. From this explanation the core ideas can be drawn from these statements; * Managed -> the customer does not need to worry about the underlying software integrations, they just enable the API. * application platform -> It consists of open source tools like K8s, Knative, Istio and Tekton * Enterprises -> these are usually organisations with complex needs * Consistency -> to have the same policies declaratively initiated to be run anywhere securely e.g on-prem, GCP or other-clouds (AWS or Azure) fun fact: Anthos is flower in greek, they grow in the ground (earth) but need rain from the clouds to flourish.
List the technical components that make up Anthos
* Infrastructure management - Google Kubernetes Engine (GKE) * Cluster management - GKE, Ingress for Anthos * Service management - Anthos Service Mesh * Policy enforcement - Anthos Config Management, Anthos Enterprise Data Protection, Policy Controller * Application deployment - CI/CD tools like Cloud Build, GitLab * Application development - Cloud Code
What is the primary computing environment for Anthos to easily manage workload deployment?
* Google Kubernetes Engine (GKE)
How does Anthos handle the control plane and node components for GKE?
On GCP the kubernetes api-server is the only control plane component exposed to customers whilst compute engine manages instances in the project.
Which load balancing options are available?
* Networking load balancing for L4 and HTTP(S) Load Balancing for L7 which are both managed services that do not require additional configuration. * Ingress for Anthos which allows the ability to deploy a load balancer that serves an application across multiple clusters on GKE
Can you deploy Anthos on AWS?
* Yes, Anthos on AWS is now GA. For more read [here](https://cloud.google.com/anthos/gke/docs/aws)
List and explain the enterprise security capabilities provided by Anthos
* Control plane security - GCP manages and maintains the K8s control plane out of the box. The user can secure the api-server by using master authorized networks and private clusters. These allow the user to disable access on the public IP address by assigning a private IP address to the master. * Node security - By default workloads are provisioned on Compute engine instances that use Google's Container Optimised OS. This operating system implements a locked-down firewall, limited user accounts with root disabled and a read-only filesystem. There is a further option to enable GKE Sandbox for stronger isolation in multi-tenant deployment scenarios. * Network security - Within a created cluster VPC, Anthos GKE leverages a powerful software-defined network that enables simple Pod-to-Pod communications. Network policies allow locking down ingress and egress connections in a given namespace. Filtering can also be implemented to incoming load-balanced traffic for services that require external access, by supplying whitelisted CIDR IP ranges. * Workload security - Running workloads run with limited privileges, default Docker AppArmor security policies are applied to all Kubernetes Pods. Workload identity for Anthos GKE aligns with the open source kubernetes service accounts with GCP service account permissions. * Audit logging - Administrators are given a way to retain, query, process and alert on events of the deployed environments.
How can workloads deployed on Anthos GKE on-prem clusters securely connect to Google Cloud services?
* Google Cloud Virtual Private Network (Cloud VPN) - this is for secure networking * Google Cloud Key Management Service (Cloud KMS) - for key management
What is Island Mode configuration with regards to networking in Anthos GKE deployed on-prem?
* This is when pods can directly talk to each other within a cluster, but cannot be reached from outside the cluster thus forming an "island" within the network that is not connected to the external network.
Explain Anthos Config Management
It is a core component of the Anthos stack which provides platform, service and security operators with a single, unified approach to multi-cluster management that spans both on-premises and cloud environments. It closely follows K8s best practices, favoring declarative approaches over imperative operations, and actively monitors cluster state and applies the desired state as defined in Git. It includes three key components as follows: 1. An importer that reads from a central Git repository 2. A component that synchronises stored configuration data into K8s objects 3. A component that monitors drift between desired and actual cluster configurations with a capability of reconciliation when need rises.
How does Anthos Config Management help?
It follows common modern software development practices which makes cluster configuration, management and policy changes auditable, revertable, and versionable easily enforcing IT governance and unifying resource management in an organisation.
What is Anthos Service Mesh?
* It is a suite of tools that assist in monitoring and managing deployed services on Anthos of all shapes and sizes whether running in cloud, hybrid or multi-cloud environments. It leverages the APIs and core components from Istio, a highly configurable and open-source service mesh platform.
Describe the two main components of Anthos Service Mesh
1. Data plane - it consists of a set of distributed proxies that mediate all inbound and outbound network traffic between individual services which are configured using a centralised control plane and an open API 2. Control plane - is a fully managed offering outside of Anthos GKE clusters to simplify management overhead and ensure highest possible availability.
What are the components of the managed control plane of Anthos Service Mesh?
1. Traffic Director - it is GCP's fully managed service mesh traffic control plane, responsible for translating Istio API objects into configuration information for the distributed proxies, as well as directing service mesh ingress and egress traffic 2. Managed CA - is a centralised certificate authority responsible for providing SSL certificates to each of the distributed proxies, authentication information and distributing secrets 3. Operations tooling - formerly stackdriver, provides a managed ingestion point for observability and telemetry, specifically monitoring, tracing and logging data generated by each of the proxies. This powers the observability dashboard for operators to visually inspect their services and service dependencies assisting in the implementation of SRE best practices for monitoring SLIs and establishing SLOs.
How does Anthos Service Mesh help?
Tool and technology integration that makes up Anthos service mesh delivers significant operational benefits to Anthos environments, with minimal additional overhead such as follows: * Uniform observability - the data plane reports service to service communication back to the control plane generating a service dependency graph. Traffic inspection by the proxy inserts headers to facilitate distributed tracing, capturing and reporting service logs together with service-level metrics (i.e latency, errors, availability). * Operational agility - fine-grained controls for managing the flow of inter-mesh (north-south) and intra-mesh (east-west) traffic are provided. * Policy-driven security - policies can be enforced consistently across diverse protocols and runtimes as service communications are secured by default.
List possible use cases of traffic controls that can be implemented within Anthos Service Mesh
* Traffic splitting across differing service versions for canary or A/B testing * Circuit breaking to prevent cascading failures * Fault injection to help build resilient and fault-tolerant deployments * HTTP header-based traffic steering between individual services or versions
What is Cloud Run for Anthos?
It is part of the Anthos stack that brings a serverless container experience to Anthos, offering a high-level platform experience on top of K8s clusters. It is built with Knative, an open-source operator for K8s that brings serverless application serving and eventing capabilities.
How does Cloud Run for Anthos simplify operations?
Platform teams in organisations that wish to offer developers additional tools to test, deploy and run applications can use Knative to enhance this experience on Anthos as Cloud Run. Below are some of the benefits; * Easy migration from K8s deployments - Without Cloud Run, platform engineers have to configure deployment, service, and HorizontalPodAutoscalers(HPA) objects to a loadbalancer and autoscaling. If application is already serving traffic it becomes hard to change configurations or roll back efficiently. Using Cloud Run all this is managed thus the Knative service manifest describes the application to be autoscaled and loadbalanced * Autoscaling - a sudden traffic spike may cause application containers in K8s to crash due to overload thus an efficient automated autoscaling is executed to serve the high volume of traffic * Networking - it has built-in load balancing capabilities and policies for traffic splitting between multiple versions of an application. * Releases and rollouts - supports the notion of the Knatibe API's revisions which describe new versions or different configurations of your application and canary deployments by splitting traffic. * Monitoring - observing and recording metrics such as latency, error rate and requests per second.
List and explain three high-level out of the box autoscaling primitives offered by Cloud Run for Anthos that do not exist in K8s natively
* Rapid, request-based autoscaling - default autoscalers monitor request metrics which allows Cloud Run for Anthos to handle spiky traffic patterns smoothly * Concurrency controls - limits such as max in-flight requests per container are enforced to ensure the container does not become overloaded and crash. More containers are added to handle the spiky traffic, buffering the requests. * Scale to zero - if an application is inactive for a while Cloud Run scales it down to zero to reduce its footprint. Alternatively one can turn off scale-to-zero to prevent cold starts.
List some Cloud Run for Anthos use cases
As it does not support stateful applications or sticky sessions, it is suitable for running stateless applications such as: * Machine learning model predictions e.g Tensorflow serving containers * API gateways, API middleware, web front ends and Microservices * Event handlers, ETL
================================================ FILE: topics/gcp/exercises/assign_roles/exercise.md ================================================ # Assign Roles ## Objectives 1. Assign the following roles to a member in your organization 1. Compute Storage Admin 2. Compute Network Admin 3. Compute Security Admin 2. Verify roles were assigned ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/gcp/exercises/assign_roles/main.tf ================================================ locals { roles = [ "roles/compute.storageAdmin", "roles/compute.networkAdmin", "roles/compute.securityAdmin" ] } resource "google_service_account" "some_member" { account_id = "${substr(var.env_id, 0, min(length(var.env_id), 10))}-some-member" display_name = "${var.env_id} some-member" } resource "google_project_iam_member" "storageAdminMaster" { for_each = toset(concat(local.roles)) project = "${var.project_id}" role = each.key member = "serviceAccount:${google_service_account.some_member.email}" } ================================================ FILE: topics/gcp/exercises/assign_roles/solution.md ================================================ # Assign Roles ## Objectives 1. Assign the following roles to a member in your organization 1. Compute Storage Admin 2. Compute Network Admin 3. Compute Security Admin 2. Verify roles were assigned ## Solution ### Console 1. Go to IAM & Admin 2. Click on IAM and then on the "Add" button 1. Choose the member account to whom the roles will be added 2. Under select role, search for the specified roles under "Objectives" and click on "Save" 2. The member should now be able to go to the compute engine API and see the resources there. ### Terraform Click [here](main.tf) to view the Terraform main.tf file ================================================ FILE: topics/gcp/exercises/assign_roles/vars.tf ================================================ variable "project_id" { type = string } variable "env_id" { type = string } ================================================ FILE: topics/gcp/exercises/assign_roles/versions.tf ================================================ terraform { required_version = ">=1.3.0" required_providers { google = { source = "hashicorp/google" version = ">= 4.10.0, < 5.0" } } } ================================================ FILE: topics/gcp/exercises/create_project/exercise.md ================================================ # Create a Project ## Objectives 1. Create a project with a unique name ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/gcp/exercises/create_project/main.tf ================================================ resource "google_project" "gcp_project" { name = "Some Project" project_id = "some-unique-project-id" folder_id = google_folder.some_folder.name } resource "google_folder" "some_folder" { display_name = "Department 1" parent = "organizations/some-organization" } ================================================ FILE: topics/gcp/exercises/create_project/solution.md ================================================ # Create a Project ## Objectives 1. Create a project with a unique name ## Solution ### Console 1. Click in the top bar on "New Project" (if you already have a project then, click on the project name and then "New Project") or in the search bar insert "Create Project". 2. Insert a globally unique project name 3. Optionally choose an organization 4. Optionally put it under a specific folder 5. Click on "Create" :) ### Terraform Click [here](main.tf) to view the solution ================================================ FILE: topics/gcp/exercises/create_project/versions.tf ================================================ terraform { required_version = ">=1.3.0" required_providers { google = { source = "hashicorp/google" version = ">= 4.10.0, < 5.0" } } } ================================================ FILE: topics/gcp/exercises/instance_101/exercise.md ================================================ # Create an Instance ## Objectives 1. Create a VM instance with the following properties 1. name: instance-1 2. type: e2-micro 3. labels: 1. app: web 2. env: dev 2. Using the CLI (gcloud) perform the following operations: 1. Update "app" label to "db" 2. Remove "env" label ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/gcp/exercises/instance_101/main.tf ================================================ resource "google_compute_network" "vpc_network" { name = "my-custom-mode-network" auto_create_subnetworks = false mtu = 1460 } resource "google_compute_subnetwork" "default" { name = "my-custom-subnet" ip_cidr_range = "10.0.1.0/24" region = "us-west1" network = google_compute_network.vpc_network.id } resource "google_compute_instance" "default" { name = "instance-1" machine_type = "e2-micro" zone = "us-west1-a" labels = { app = "db" } } ================================================ FILE: topics/gcp/exercises/instance_101/solution.md ================================================ # Create an Instance ## Objectives 1. Create a VM instance with the following properties 1. name: instance-1 2. type: e2-micro 3. labels: 1. app: web 2. env: dev 2. Using the CLI (gcloud) perform the following operations: 1. Update "app" label to "db" 2. Remove "env" label ## Solution ### Console 1. Go to Compute Engine -> VM instances 2. Click on "Create Instance" 1. Insert the name "instance-1" 2. Click on "Add label" and add the following labels: 1. app: web 2. env: dev 3. Choose machine type: e2-micro 3. Click on "Create" 4. Selected the created instance and click on "show info panel" 1. Click on "labels" tab and change the value of "app" label to "db" 2. Remove the "env" label ### Shell ``` gcloud config set project gcloud config set compute/region gcloud config set compute/zone gcloud compute instances create instance-1 --labels app=web,env=dev --machine-type=e2-micro gcloud compute instances update instance-1 --update-labels app=db gcloud compute instances update instance-1 --remove-labels env ``` ### Terraform Click [here](main.tf) to view the main.tf file ================================================ FILE: topics/gcp/exercises/instance_101/versions.tf ================================================ terraform { required_version = ">=1.3.0" required_providers { google = { source = "hashicorp/google" version = ">= 4.10.0, < 5.0" } } } ================================================ FILE: topics/git/README.md ================================================ # Git ## Exercises | Name | Topic | Objective & Instructions | Solution | Comments | | ----------------- | ------ | -------------------------------- | ------------------------------------------- | -------- | | My first Commit | Commit | [Exercise](commit_01.md) | [Solution](solutions/commit_01_solution.md) | | | Time to Branch | Branch | [Exercise](branch_01.md) | [Solution](solutions/branch_01_solution.md) | | | Squashing Commits | Commit | [Exercise](squashing_commits.md) | [Solution](solutions/squashing_commits.md) | | ## Questions ### Git Basics
How do you know if a certain directory is a git repository?
You can check if there is a ".git" directory.
Explain the following: git directory, working directory and staging area
This answer taken from [git-scm.com](https://git-scm.com/book/en/v1/Getting-Started-Git-Basics#_the_three_states) "The Git directory is where Git stores the meta-data and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer. The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify. The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit. It’s sometimes referred to as the index, but it’s becoming standard to refer to it as the staging area."
What is the difference between git pull and git fetch?
Shortly, git pull = git fetch + git merge When you run git pull, it gets all the changes from the remote or central repository and attaches it to your corresponding branch in your local repository. git fetch gets all the changes from the remote repository, stores the changes in a separate branch in your local repository
How to check if a file is tracked and if not, then track it?
There are different ways to check whether a file is tracked or not: - `git ls-files ` -> exit code of 0 means it's tracked - `git blame ` ...
Explain what the file gitignore is used for
The purpose of gitignore files is to ensure that certain files not tracked by Git remain untracked. To stop tracking a file that is currently tracked, use git rm --cached.
How can you see which changes have done before committing them?
`git diff`
What git status does?
`git status` helps you to understand the tracking status of files in your repository. Focusing on working directory and staging area - you can learn which changes were made in the working directory, which changes are in the staging area and in general, whether files are being tracked or not.
You've created new files in your repository. How to make sure Git tracks them?
`git add FILES`
### Scenarios
You have files in your repository you don't want Git to ever track them. What should you be doing to avoid ever tracking them?
Add them to the file `.gitignore`. This will make sure these files are never added to staging area.
A development team in your organization is using a monorepo and it's became quite big, including hundred thousands of files. They say running many git operations is taking a lot of time to run (like git status for example). Why does that happen and what can you do in order to help them?
Many Git operations are related to filesystem state. `git status` for example will run diffs to compare HEAD commit to index and another diff to compare index to working directory. As part of these diffs, it would need to run quite a lot of `lstat()` system calls. When running on hundred thousands of files, it can take seconds if not minutes. One thing to do about it, would be to use the built-in `fsmonitor` (filesystem monitor) of Git. With fsmonitor (which integrated with Watchman), Git spawn a daemon that will watch for any changes continuously in the working directory of your repository and will cache them . This way, when you run `git status` instead of scanning the working directory, you are using a cached state of your index.

Next, you can try to enable `feature.manyFile` with `git config feature.manyFiles true`. This does two things: 1. Sets `index.version = 4` which enables path-prefix compression in the index 2. Sets `core.untrackedCache=true` which by default is set to `keep`. The untracked cache is quite important concept. What it does is to record the mtime of all the files and directories in the working directory. This way, when time comes to iterate over all the files and directories, it can skip those whom mtime wasn't updated. Before enabling it, you might want to run `git update-index --test-untracked-cache` to test it out and make sure mtime operational on your system. Git also has the built-in `git-maintainence` command which optimizes Git repository so it's faster to run commands like `git add` or `git fatch` and also, the git repository takes less disk space. It's recommended to run this command periodically (e.g. each day). In addition, track only what is used/modified by developers - some repositories may include generated files that are required for the project to run properly (or support certain accessibility options), but not actually being modified by any way by the developers. In that case, tracking them is futile. In order to avoid populating those file in the working directory, one can use the `sparse checkout` feature of Git. Finally, with certain build systems, you can know which files are being used/relevant exactly based on the component of the project that the developer is focusing on. This, together with the `sparse checkout` can lead to a situation where only a small subset of the files are being populated in the working directory. Making commands like `git add`, `git status`, etc. really quick
### Branches
What's is the branch strategy (flow) you know?
- Git flow - GitHub flow - Trunk based development - GitLab flow [Explanation](https://www.bmc.com/blogs/devops-branching-strategies/#:~:text=What%20is%20a%20branching%20strategy,used%20in%20the%20development%20process).
True or False? A branch is basically a simple pointer or reference to the head of certain line of work
True
You have two branches - main and devel. How do you make sure devel is in sync with main?
``` git checkout main git pull git checkout devel git merge main ```
Describe shortly what happens behind the scenes when you run git branch
Git runs update-ref to add the SHA-1 of the last commit of the branch you're on into the new branch you would like to create
When you run git branch how does Git know the SHA-1 of the last commit?
Using the HEAD file: `.git/HEAD`
What unstaged means in regards to Git?
A file that is in the working directory but is not in the HEAD nor in the staging area is referred to as "unstaged".
True or False? when you git checkout some_branch, Git updates .git/HEAD to /refs/heads/some_branch
True
### Merge
You have two branches - main and devel. How do you merge devel into main?
``` git checkout main git merge devel git push origin main ```
How to resolve git merge conflicts?

First, you open the files which are in conflict and identify what are the conflicts. Next, based on what is accepted in your company or team, you either discuss with your colleagues on the conflicts or resolve them by yourself After resolving the conflicts, you add the files with `git add ` Finally, you run `git rebase --continue`

What merge strategies are you familiar with?
Mentioning two or three should be enough and it's probably good to mention that 'recursive' is the default one. recursive resolve ours theirs This page explains it the best: https://git-scm.com/docs/merge-strategies
Explain Git octopus merge
Probably good to mention that it's: - It's good for cases of merging more than one branch (and also the default of such use cases) - It's primarily meant for bundling topic branches together This is a great article about Octopus merge: http://www.freblogg.com/2016/12/git-octopus-merge.html
What is the difference between git reset and git revert?

`git revert` creates a new commit which undoes the changes from last commit. `git reset` depends on the usage, can modify the index or change the commit which the branch head is currently pointing at.

### Rebase
You would like to move forth commit to the top. How would you achieve that?
Using the `git rebase` command
In what situations are you using git rebase?
Suppose a team is working on a `feature` branch that is coming from the `main` branch of the repo. At a point, where the feature development is done, and finally we wish to merge the feature branch into the main branch without keeping the history of the commits made in the feature branch, a `git rebase` will be helpful.
How do you revert a specific file to previous commit?
``` git checkout HEAD~1 -- /path/of/the/file ```
How to squash last two commits?
What is the .git directory? What can you find there?
The .git folder contains all the information that is necessary for your project in version control and all the information about commits, remote repository address, etc. All of them are present in this folder. It also contains a log that stores your commit history so that you can roll back to history. This info copied from [https://stackoverflow.com/questions/29217859/what-is-the-git-folder](https://stackoverflow.com/questions/29217859/what-is-the-git-folder)
What are some Git anti-patterns? Things that you shouldn't do
- Not waiting too long between commits - Not removing the .git directory :)
How do you remove a remote branch?
You delete a remote branch with this syntax: git push origin :[branch_name]
Are you familiar with gitattributes? When would you use it?
gitattributes allow you to define attributes per pathname or path pattern.
You can use it for example to control endlines in files. In Windows and Unix based systems, you have different characters for new lines (\r\n and \n accordingly). So using gitattributes we can align it for both Windows and Unix with `* text=auto` in .gitattributes for anyone working with git. This is way, if you use the Git project in Windows you'll get \r\n and if you are using Unix or Linux, you'll get \n.
How do you discard local file changes? (before commit)
`git checkout -- `
How do you discard local commits?
`git reset HEAD~1` for removing last commit If you would like to also discard the changes you `git reset --hard``
True or False? To remove a file from git but not from the filesystem, one should use git rm
False. If you would like to keep a file on your filesystem, use `git reset `
## References
How to list the current git references in a given repository?
`find .git/refs/`
## Git Diff
What git diff does?
git diff can compare between two commits, two files, a tree and the staging area, etc.
Which one is faster? git diff-index HEAD or git diff HEAD
`git diff-index` is faster but to be fair, it's because it does less. `git diff index` won't look at the content, only metadata like timestamps.
By which other Git commands does git diff used?
The diff mechanism used by `git status` to perform a comparison and let the user know which files are being tracked
## Git Internal
Describe how `git status` works
Shortly, it runs `git diff` twice: 1. Compare between HEAD to staging area 2. Compare staging area to working directory
If git status has to run diff on all the files in the HEAD commit to those in staging area/index and another one on staging area/index and working directory, how is it fairly fast?
One reason is about the structure of the index, commits, etc. - Every file in a commit is stored in tree object - The index is then a flattened structure of tree objects - All files in the index have pre-computed hashes - The diff operation then, is comparing the hashes Another reason is caching - Index caches information on working directory - When Git has the information for certain file cached, there is no need to look at the working directory file
================================================ FILE: topics/git/branch_01.md ================================================ ## Git Commit 01 ### Objective Learn how to work with Git Branches ### Instructions 1. Pick up a Git repository (or create a new one) with at least one commit 2. Create a new branch called "dev" 3. Modify one of the files in the repository 4. Create a new commit 5. Verify the commit you created is only in "dev" branch ### After you complete the exercise Answer the following: 1. Why branches are useful? Give an example of one real-world scenario for using branches ================================================ FILE: topics/git/commit_01.md ================================================ ## Git Commit 01 ### Objective Learn how to commit changes in Git repositories ### Instructions 1. Create a new directory 2. Make it a git repository 3. Create a new file called `file` with the content "hello commit" 4. Commit your new file 5. Run a git command to verify your commit was recorded ### After you complete the exercise Answer the following: * What are the benefits of commits? * Is there another way to verify a commit was created? ================================================ FILE: topics/git/solutions/branch_01_solution.md ================================================ ## Branch 01 - Solution ``` cd some_repository echo "master branch" > file1 git add file1 git commit -a -m "added file1" git checkout -b dev echo "dev branch" > file2 git add file2 git commit -a -m "added file2" ``` Verify: ``` git log (you should see two commits) git checkout master git log (you should see one commit) ``` ================================================ FILE: topics/git/solutions/commit_01_solution.md ================================================ ## Git Commit 01 - Solution ``` mkdir my_repo && cd my_repo git init echo "hello_commit" > file git add file git commit -a -m "It's my first commit. Exciting!" git log ``` ================================================ FILE: topics/git/solutions/squashing_commits.md ================================================ ## Git - Squashing Commits - Solution 1. In a git repository, create a new file with the content "Mario" and commit the change: ``` echo "Mario" > new_file git add new_file git commit -m "New file" ``` 2. Make a change to the content of the file you just created so it becomes "Mario & Luigi," then create another commit: ``` echo "Mario & Luigi" > new_file git commit -a -m "Added Luigi" ``` 3. Verify you have two separate commits by running: ``` git log ``` 4. Squash the two commits you've created into one commit: ``` git rebase -i HEAD~2 ``` You should see something similar to: ``` pick 5412076 New file pick 4016808 Added Luigi ``` Change `pick` to `squash`: ``` pick 5412076 New file squash 4016808 Added Luigi ``` Save it and provide a commit message for the squashed commit. > **Note**: If running `git rebase -i HEAD~2` returns a fatal error (e.g., "invalid upstream 'HEAD~2'"), that usually means your second commit is actually the root commit and there's no valid parent before it. In that case, you can either: > * Use `git rebase -i --root` to allow rewriting the root commit, **or** > * Create an initial commit before these two commits so that `HEAD~2` points to valid commits. ### After you complete the exercise **Answer the following:** * **What is the reason for squashing commits?** History becomes cleaner and it's easier to track changes without many small commits like "removed a character," for example. * **Is it possible to squash more than 2 commits?** Yes. ================================================ FILE: topics/git/squashing_commits.md ================================================ ## Git - Squashing Commits ### Objective Learn how to squash commits ### Instructions 1. In a git repository, create a new file with the content "Mario" and create a new commit 2. Make change to the content of the file you just created so the content is "Mario & Luigi" and create another commit 3. Verify you have two separate commits 4. Squash the latest two commits into one commit ### After you complete the exercise Answer the following: * What is the reason for squashing commits? * Is it possible to squash more than 2 commits? ================================================ FILE: topics/grafana/README.md ================================================ ## Grafana
Explain what is Grafana
[Grafana Docs](https://grafana.com/docs/grafana/latest/introduction): "Grafana is a complete observability stack that allows you to monitor and analyze metrics, logs and traces. It allows you to query, visualize, alert on and understand your data no matter where it is stored. Create, explore, and share beautiful dashboards with your team and foster a data driven culture."
What is Grafana Cloud?
[Grafana cloud](https://grafana.com/products/cloud/) is an edition of Grafana that is offered as a service through the cloud. The observabilty stack is set up, administered and maintained by Grafana Labs and offers both free and paid options. You can also send data from existing data sources e.g. Promethetus, Loki and visualise existing time series data.
What is Grafana Enterprise?
[Grafana Enterprise](https://grafana.com/docs/grafana/latest/enterprise/#enterprise-plugins) is a commercial edition of Grafana offered with enterprise features such as _Enterprise datasource_ plugins and built-in collaboration features. The edition includes full-time support and training from the Grafana team.
What is the default HTTP port of Grafana?
[Grafana getting started](https://grafana.com/docs/grafana/latest/getting-started/getting-started/): Grafana runs on port 3000 by default.
Explain how we can enforce HTTPS
[Grafana community](https://grafana.com/docs/grafana/latest/getting-started/getting-started/): Set the protocol to _https_ on the Configuration settings, Grafana will then expect clients to send requests using the HTTPS protocol. Any client that uses HTTP will receive an SSL/TLS error.
How can we install plugins for Grafana?
[Grafana getting started](https://grafana.com/docs/grafana/latest/plugins/installation/): Navigate to the [Grafana plugins page](https://grafana.com/grafana/plugins/), find the desired plugin and click on it, then click the installation tab. There are two ways to install depending on where your Grafana server is running: - Cloud: On the **For** field of the installation tab, select the name of the organization you want to install the plugin on (unless you are only part of one), then click **install plugin**. Grafana cloud will automatically install the plugin to your Grafana instance, you may need to log out and back in to see the plugin. - Local grafana: You can use the Grafana CLI which lets you list available plugins and install them. ``` grafana-cli plugins list-remote grafana-cli plugins install ``` You can also install a packaged plugin by downloading the asset from the installation tab, then extract the archive into the plugin directory. The path to the plugin directory can be seen in the configuration file ``` unzip my-plugin-0.2.0.zip -d YOUR_PLUGIN_DIR/my-plugin ```
Explain what a 'Data source' is
[Grafana Docs](https://grafana.com/docs/grafana/latest/datasources/): A data source is a storage backend that acts as a source of data for Grafana. Some popular data sources are Prometheus, InfluxDB, Loki, AWS cloudwatch.
What is the "Default configuration"?
[Grafana docs](https://grafana.com/docs/grafana/latest/administration/configuration/): The default configuration contains settings that Grafana use by default. The location depends on the OS environment, note that $WORKING_DIR refers to the working directory of Grafana. - Windows: ```$WORKING_DIR/conf/defaults.ini``` - Linux: ```/etc/grafana/grafana.ini``` - macOS: ```/usr/local/etc/grafana/grafana.ini```
Explain how we can add Custom configuration to Grafana
[Grafana docs](https://grafana.com/docs/grafana/latest/administration/configuration/): The custom configuration can be configured, either by modifying the custom configuration file or by adding environment variables that overrides default configuration. The configuration varies depending on the OS: - Windows: There is a file ```sample.ini``` in the same directory as the defaults.ini file, copy sample.ini and name it ```custom.ini```. Uncomment the settings you want to override. - Linux: Edit the configuration file at ```/etc/grafana/grafana.ini``` - macOS: Add a configuration file named ```custom.ini``` in the conf folder, if you installed Grafana using Homebrew then you can manually edit the ```conf/defaults.ini``` - Docker: You can override existing configuration in Grafana with environmental variables. An example is setting the Grafana instance name: ```E.g. export GF_DEFAULT_INSTANCE_NAME=my-instance```
Which external authentication is supported out-of-the-box?
[Grafana docs](https://grafana.com/docs/grafana/latest/auth/overview/): Grafana Auth is the built-in authentication system with password authentication enabled by default.
How can we import a dashboard to a Grafana instance?
[Grafana getting started](https://grafana.com/docs/grafana/latest/dashboards/export-import/): Grafana dashboards can be imported through the Grafana UI. Click on the + icon in the sidebar and then click import. You can import a dashboard through the following options: - Uploading a dashboard JSON file, which is exported from the Grafana UI or fetched through the [HTTPS API](https://grafana.com/docs/grafana/latest/http_api/dashboard/#create-update-dashboard ) - Paste a Grafana dashboard URL which is found at [grafana Dashboards](https://grafana.com/grafana/dashboards/), or a dashboard unique id into the text area. - Paste raw Dashboard JSON text into the panel area. Click load afterwards.
What is the data format for the dashboard?
[Grafana docs](https://grafana.com/docs/grafana/latest/dashboards/json-model/): Grafana dashboards are represented in JSON files as objects, they store metadata about a dashboard e.g. dashboard properties, panel metadata and variables.
Explain the steps to share your dashboard with your team
[Grafana docs](https://grafana.com/docs/grafana/latest/sharing/share-dashboard/): Go to the homepage of your grafana Instance. Click on the share icon in the top navigation, from there three tabs are visible with the link tab shown. - Direct link: Click copy, send the link to a Grafana user, note that the user needs authorization to view the link. This is done by adding the user to a team. - Public Snapshot: Click on local snapshot to publish a snapshot to your local Grafana instance, or Publish to snapshots.raintank.io which is a free service for publishing dashboard snapshots to an external Grafana instance You can configure snapshots to expire after a certain time and the timeout value to collect dashboard metrics
How can you organise your dashboards and users in Grafana?
[Grafana docs](https://grafana.com/blog/2022/03/14/how-to-best-organize-your-teams-and-resources-in-grafana/ ): The recommended way by Grafana labs is to create Folders for grouping dashboards, library panels and alerts. Users can be organised through Teams which grants permissions to members of a group. - [Folders](https://grafana.com/docs/grafana/latest/dashboards/dashboard_folders/): Click the + icon in the sidebar, then click "Create folder". In the create folder page, fill an unique name for the folder and click "Create" - [Teams](https://grafana.com/tutorials/create-users-and-teams/) You need to be the server admin in order to create Teams. 1. Click the server admin (shield) icon in the sidebar, then in the Users tab, click New user. 2. Enter the user details e.g. name, E-mail, Username and Password. The password can be changed later by the user 3. Click Create to create the user account.
Explain the steps to create an 'Alert'
[Grafana docs](https://grafana.com/docs/grafana/latest/alerting/old-alerting/create-alerts/): "Navigate to the panel you want to add or edit an alert rule for, click the title, and then click Edit. On the Alert tab, click Create Alert. If an alert already exists for this panel, then you can just edit the fields on the Alert tab. Fill out the fields. Descriptions are listed below in Alert rule fields. When you have finished writing your rule, click Save in the upper right corner to save alert rule and the dashboard. (Optional but recommended) Click Test rule to make sure the rule returns the results you expect"
================================================ FILE: topics/jenkins_pipelines.md ================================================ ## Jenkins Pipelines Write/Create the following Jenkins pipelines: * A pipeline which will run unit tests upon git push to a certain repository * A pipeline which will do to the following: * Provision an instance (can also be a container) * Configure the instance as Apache web server * Deploy a web application on the provisioned instance ================================================ FILE: topics/jenkins_scripts.md ================================================ ## Jenkins Scripts Write the following scripts: * Remove all the jobs which include the string "REMOVE_ME" in their name * Remove builds older than 14 days ### Answer * [Remove jobs which include specific string](jenkins/scripts/jobs_with_string.groovy) * [Remove builds older than 14 days](jenkins/scripts/old_builds.groovy) ================================================ FILE: topics/kafka/README.md ================================================ # Apache Kafka ## Kafka Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| ## Kafka Self Assessment * [Kafka 101](#questions-kafka-101) ### Kafka 101
What is Kafka?
[kafka.apache.org](https://kafka.apache.org): "Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications." In other words, Kafka is a sort of distributed log where you can store events, read them and distribute them to different services and do it in high-scale and real-time.
What Kafka is used for?
- Real-time e-commerce - Banking - Health Care - Automotive (traffic alerts, hazard alerts, ...) - Real-time Fraud Detection
What is a "Producer" in regards to Kafka?
An application that publishes data to the Kafka cluster.
### Kafka Architecture
What's in a Kafka cluster?
- Broker: a server with kafka process running on it. Such server has local storage. In a single Kafka clusters there are usually multiple brokers.
What is the role of ZooKeeper is Kafka?
In Kafka, Zookeeper is a centralized controller that manages metadata for producers, brokers, and consumers. Zookeeper also:
  • Tracks which brokers are part of the Kafka cluster
  • Determines which broker is the leader of a given partition and topic
  • Performs leader elections
  • Manages cluster membership of brokers
================================================ FILE: topics/kubernetes/CKA.md ================================================ # CKA (Certified Kubernetes Administrator) - [CKA (Certified Kubernetes Administrator)](#cka-certified-kubernetes-administrator) - [Setup](#setup) - [Pods](#pods) - [Troubleshooting Pods](#troubleshooting-pods) - [Namespaces](#namespaces) - [Nodes](#nodes) - [Services](#services) - [ReplicaSets](#replicasets) - [Troubleshooting ReplicaSets](#troubleshooting-replicasets) - [Deployments](#deployments) - [Troubleshooting Deployments](#troubleshooting-deployments) - [Scheduler](#scheduler) - [Node Affinity](#node-affinity) - [Labels and Selectors](#labels-and-selectors) - [Node Selector](#node-selector) - [Taints](#taints) - [Resources Limits](#resources-limits) - [Monitoring](#monitoring) - [Scheduler](#scheduler-1) ## Setup * Set up Kubernetes cluster. Use one of the following 1. Minikube for local free & simple cluster 2. Managed Cluster (EKS, GKE, AKS) * Set aliases ``` alias k=kubectl alias kd=kubectl delete alias kds=kubectl describe alias ke=kubectl edit alias kr=kubectl run alias kg=kubectl get ``` ## Pods
Run a command to view all the pods in the current namespace
`kubectl get pods` Note: create an alias (`alias k=kubectl`) and get used to `k get po`
Run a pod called "nginx-test" using the "nginx" image
`k run nginx-test --image=nginx`
Assuming that you have a Pod called "nginx-test", how to remove it?
`k delete po nginx-test`
In what namespace the etcd pod is running? list the pods in that namespace
`k get po -n kube-system` Let's say you didn't know in what namespace it is. You could then run `k get po -A | grep etc` to find the Pod and see in what namespace it resides.
List pods from all namespaces
`k get po -A` The long version would be `kubectl get pods --all-namespaces`.
Write a YAML of a Pod with two containers and use the YAML file to create the Pod (use whatever images you prefer)
``` cat > pod.yaml < pod.yaml`. If you ask yourself "how am I supposed to remember this long command" time to change attitude ;)
Create a YAML of a Pod without actually running the Pod with the kubectl command (use whatever image you prefer)
`k run some-pod -o yaml --image nginx-unprivileged --dry-run=client > pod.yaml`
How to test a manifest is valid?
with `--dry-run` flag which will not actually create it, but it will test it and you can find this way, any syntax issues. `k create -f YAML_FILE --dry-run`
How to check which image a certain Pod is using?
`k describe po | grep -i image`
How to check how many containers run in single Pod?
`k get po POD_NAME` and see the number under "READY" column. You can also run `k describe po POD_NAME`
Run a Pod called "remo" with the the latest redis image and the label 'year=2017'
`k run remo --image=redis:latest -l year=2017`
List pods and their labels
`k get po --show-labels`
Delete a Pod called "nm"
`k delete po nm`
List all the pods with the label "env=prod"
`k get po -l env=prod` To count them: `k get po -l env=prod --no-headers | wc -l`
Create a static pod with the image python that runs the command sleep 2017
First change to the directory tracked by kubelet for creating static pod: `cd /etc/kubernetes/manifests` (you can verify path by reading kubelet conf file) Now create the definition/manifest in that directory `k run some-pod --image=python --command sleep 2017 --restart=Never --dry-run=client -o yaml > static-pod.yaml`
Describe how would you delete a static Pod
Locate the static Pods directory (look at `staticPodPath` in kubelet configuration file). Go to that directory and remove the manifest/definition of the staic Pod (`rm /`)
### Troubleshooting Pods
You try to run a Pod but see the status "CrashLoopBackOff". What does it means? How to identify the issue?
The container failed to run (due to different reasons) and Kubernetes tries to run the Pod again after some delay (= BackOff time). Some reasons for it to fail: - Misconfiguration - misspelling, non supported value, etc. - Resource not available - nodes are down, PV not mounted, etc. Some ways to debug: 1. `kubectl describe pod POD_NAME` 1. Focus on `State` (which should be Waiting, CrashLoopBackOff) and `Last State` which should tell what happened before (as in why it failed) 2. Run `kubectl logs mypod` 1. This should provide an accurate output of 2. For specific container, you can add `-c CONTAINER_NAME` 3. If you still have no idea why it failed, try `kubectl get events` 4.
What the error ImagePullBackOff means?
Most likely you didn't write correctly the name of the image you try to pull and run. Or perhaps it doesn't exists in the registry. You can confirm with `kubectl describe po POD_NAME`
How to check on which node a certain Pod is running?
`k get po POD_NAME -o wide`
Run the following command: kubectl run ohno --image=sheris. Did it work? why not? fix it without removing the Pod and using any image you would like
Because there is no such image `sheris`. At least for now :) To fix it, run `kubectl edit ohno` and modify the following line `- image: sheris` to `- image: redis` or any other image you prefer.
You try to run a Pod but it's in "Pending" state. What might be the reason?
One possible reason is that the scheduler which supposed to schedule Pods on nodes, is not running. To verify it, you can run `kubectl get po -A | grep scheduler` or check directly in `kube-system` namespace.
How to view the logs of a container running in a Pod?
`k logs POD_NAME`
There are two containers inside a Pod called "some-pod". What will happen if you run kubectl logs some-pod
It won't work because there are two containers inside the Pod and you need to specify one of them with `kubectl logs POD_NAME -c CONTAINER_NAME`
## Namespaces
List all the namespaces
`k get ns`
Create a namespace called 'alle'
`k create ns alle`
Check how many namespaces are there
`k get ns --no-headers | wc -l`
Check how many pods exist in the "dev" namespace
`k get po -n dev`
Create a pod called "kartos" in the namespace dev. The pod should be using the "redis" image.
If the namespace doesn't exist already: `k create ns dev` `k run kratos --image=redis -n dev`
You are looking for a Pod called "atreus". How to check in which namespace it runs?
`k get po -A | grep atreus`
## Nodes
Run a command to view all nodes of the cluster
`kubectl get nodes` Note: create an alias (`alias k=kubectl`) and get used to `k get no`
Create a list of all nodes in JSON format and store it in a file called "some_nodes.json"
`k get nodes -o json > some_nodes.json`
Check what labels one of your nodes in the cluster has
`k get no minikube --show-labels`
## Services
Check how many services are running in the current namespace
`k get svc`
Create an internal service called "sevi" to expose the app 'web' on port 1991
`kubectl expose pod web --port=1991 --name=sevi`
How to reference by name a service called "app-service" within the same namespace?
app-service
How to check the TargetPort of a service?
`k describe svc `
How to check what endpoints the svc has?
`k describe svc `
How to reference by name a service called "app-service" within a different namespace, called "dev"?
app-service.dev.svc.cluster.local
Assume you have a deployment running and you need to create a Service for exposing the pods. This is what is required/known: * Deployment name: jabulik * Target port: 8080 * Service type: NodePort * Selector: jabulik-app * Port: 8080
`kubectl expose deployment jabulik --name=jabulik-service --target-port=8080 --type=NodePort --port=8080 --dry-run=client -o yaml -> svc.yaml` `vi svc.yaml` (make sure selector is set to `jabulik-app`) `k apply -f svc.yaml`
## ReplicaSets
How to check how many replicasets defined in the current namespace?
`k get rs`
You have a replica set defined to run 3 Pods. You removed one of these 3 pods. What will happen next? how many Pods will there be?
There will still be 3 Pods running theoretically because the goal of the replica set is to ensure that. so if you delete one or more Pods, it will run additional Pods so there are always 3 Pods.
How to check which container image was used as part of replica set called "repli"?
`k describe rs repli | grep -i image`
How to check how many Pods are ready as part of a replica set called "repli"?
`k describe rs repli | grep -i "Pods Status"`
How to delete a replica set called "rori"?
`k delete rs rori`
How to modify a replica set called "rori" to use a different image?
`k edis rs rori`
Scale up a replica set called "rori" to run 5 Pods instead of 2
`k scale rs rori --replicas=5`
Scale down a replica set called "rori" to run 1 Pod instead of 5
`k scale rs rori --replicas=1`
### Troubleshooting ReplicaSets
Fix the following ReplicaSet definition ```yaml apiVersion: apps/v1 kind: ReplicaCet metadata: name: redis labels: app: redis tier: cache spec: selector: matchLabels: tier: cache template: metadata: labels: tier: cachy spec: containers: - name: redis image: redis ```
kind should be ReplicaSet and not ReplicaCet :)
Fix the following ReplicaSet definition ```yaml apiVersion: apps/v1 kind: ReplicaSet metadata: name: redis labels: app: redis tier: cache spec: selector: matchLabels: tier: cache template: metadata: labels: tier: cachy spec: containers: - name: redis image: redis ```
The selector doesn't match the label (cache vs cachy). To solve it, fix cachy so it's cache instead.
## Deployments
How to list all the deployments in the current namespace?
`k get deploy`
How to check which image a certain Deployment is using?
`k describe deploy | grep image`
Create a file definition/manifest of a deployment called "dep", with 3 replicas that uses the image 'redis'
`k create deploy dep -o yaml --image=redis --dry-run=client --replicas 3 > deployment.yaml `
Remove the deployment `depdep`
`k delete deploy depdep`
Create a deployment called "pluck" using the image "redis" and make sure it runs 5 replicas
`kubectl create deployment pluck --image=redis --replicas=5`
Create a deployment with the following properties: * called "blufer" * using the image "python" * runs 3 replicas * all pods will be placed on a node that has the label "blufer"
`kubectl create deployment blufer --image=python --replicas=3 -o yaml --dry-run=client > deployment.yaml` Add the following section (`vi deployment.yaml`): ``` spec: affinity: nodeAffinity: requiredDuringSchedlingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: blufer operator: Exists ``` `kubectl apply -f deployment.yaml`
### Troubleshooting Deployments
Fix the following deployment manifest ```yaml apiVersion: apps/v1 kind: Deploy metadata: creationTimestamp: null labels: app: dep name: dep spec: replicas: 3 selector: matchLabels: app: dep strategy: {} template: metadata: creationTimestamp: null labels: app: dep spec: containers: - image: redis name: redis resources: {} status: {} ```
Change `kind: Deploy` to `kind: Deployment`
Fix the following deployment manifest ```yaml apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: dep name: dep spec: replicas: 3 selector: matchLabels: app: depdep strategy: {} template: metadata: creationTimestamp: null labels: app: dep spec: containers: - image: redis name: redis resources: {} status: {} ```
The selector doesn't match the label (dep vs depdep). To solve it, fix depdep so it's dep instead.
## Scheduler
How to schedule a pod on a node called "node1"?
`k run some-pod --image=redix -o yaml --dry-run=client > pod.yaml` `vi pod.yaml` and add: ``` spec: nodeName: node1 ``` `k apply -f pod.yaml` Note: if you don't have a node1 in your cluster the Pod will be stuck on "Pending" state.
### Node Affinity
Using node affinity, set a Pod to schedule on a node where the key is "region" and value is either "asia" or "emea"
`vi pod.yaml` ```yaml affinity: nodeAffinity: requiredDuringSchedlingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: region operator: In values: - asia - emea ```
Using node affinity, set a Pod to never schedule on a node where the key is "region" and value is "neverland"
`vi pod.yaml` ```yaml affinity: nodeAffinity: requiredDuringSchedlingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: region operator: NotIn values: - neverland ```
## Labels and Selectors
How to list all the Pods with the label "app=web"?
`k get po -l app=web`
How to list all objects labeled as "env=staging"?
`k get all -l env=staging`
How to list all deployments from "env=prod" and "type=web"?
`k get deploy -l env=prod,type=web`
### Node Selector
Apply the label "hw=max" on one of the nodes in your cluster
`kubectl label nodes some-node hw=max`
Create and run a Pod called `some-pod` with the image `redis` and configure it to use the selector `hw=max`
``` kubectl run some-pod --image=redis --dry-run=client -o yaml > pod.yaml vi pod.yaml spec: nodeSelector: hw: max kubectl apply -f pod.yaml ```
Explain why node selectors might be limited
Assume you would like to run your Pod on all the nodes with with either `hw` set to max or to min, instead of just max. This is not possible with nodeSelectors which are quite simplified and this is where you might want to consider `node affinity`.
## Taints
Check if there are taints on node "master"
`k describe no master | grep -i taints`
Create a taint on one of the nodes in your cluster with key of "app" and value of "web" and effect of "NoSchedule". Verify it was applied
`k taint node minikube app=web:NoSchedule` `k describe no minikube | grep -i taints`
You applied a taint with k taint node minikube app=web:NoSchedule on the only node in your cluster and then executed kubectl run some-pod --image=redis. What will happen?
The Pod will remain in "Pending" status due to the only node in the cluster having a taint of "app=web".
You applied a taint with k taint node minikube app=web:NoSchedule on the only node in your cluster and then executed kubectl run some-pod --image=redis but the Pod is in pending state. How to fix it?
`kubectl edit po some-pod` and add the following ``` - effect: NoSchedule key: app operator: Equal value: web ``` Exit and save. The pod should be in Running state now.
Remove an existing taint from one of the nodes in your cluster
`k taint node minikube app=web:NoSchedule-`
## Resources Limits
Check if there are any limits on one of the pods in your cluster
`kubectl describe po | grep -i limits`
Run a pod called "yay" with the image "python" and resources request of 64Mi memory and 250m CPU
`kubectl run yay --image=python --dry-run=client -o yaml > pod.yaml` `vi pod.yaml` ``` spec: containers: - image: python imagePullPolicy: Always name: yay resources: requests: cpu: 250m memory: 64Mi ``` `kubectl apply -f pod.yaml`
Run a pod called "yay2" with the image "python". Make sure it has resources request of 64Mi memory and 250m CPU and the limits are 128Mi memory and 500m CPU
`kubectl run yay2 --image=python --dry-run=client -o yaml > pod.yaml` `vi pod.yaml` ``` spec: containers: - image: python imagePullPolicy: Always name: yay2 resources: limits: cpu: 500m memory: 128Mi requests: cpu: 250m memory: 64Mi ``` `kubectl apply -f pod.yaml`
## Monitoring
Deploy metrics-server
`kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml`
Using metrics-server, view the following: * top performing nodes in the cluster * top performing Pods
* top nodes: `kubectl top nodes` * top pods: `kubectl top pods`
## Scheduler
Can you deploy multiple schedulers?
Yes, it is possible. You can run another pod with a command similar to: ``` spec: containers: - command: - kube-scheduler - --address=127.0.0.1 - --leader-elect=true - --scheduler-name=some-custom-scheduler ... ```
Assuming you have multiple schedulers, how to know which scheduler was used for a given Pod?
Running `kubectl get events` you can see which scheduler was used.
You want to run a new Pod and you would like it to be scheduled by a custom scheduler. How to achieve it?
Add the following to the spec of the Pod: ``` spec: schedulerName: some-custom-scheduler ```
================================================ FILE: topics/kubernetes/README.md ================================================ # Kubernetes What's your goal? * I would like to prepare for CKA certification * See [CKA](CKA.md) page * I would like to learn Kubernetes by practicing both theoritcal and practical material * Solve [exercises](#kubernetes-exercises) * Solve [questions](#kubernetes-questions) * I would like to learn practical Kubernetes * Solve [exercises](#kubernetes-exercises) - [Kubernetes](#kubernetes) - [Kubernetes Exercises](#kubernetes-exercises) - [Pods](#pods) - [Service](#service) - [ReplicaSet](#replicaset) - [Labels and Selectors](#labels-and-selectors) - [Scheduler](#scheduler) - [Kustomize](#kustomize) - [Kubernetes Questions](#kubernetes-questions) - [Kubernetes 101](#kubernetes-101) - [Cluster and Architecture](#cluster-and-architecture) - [Kubelet](#kubelet) - [Nodes Commands](#nodes-commands) - [Pods](#pods-1) - [Static Pods](#static-pods) - [Pods Commands](#pods-commands) - [Pods Troubleshooting and Debugging](#pods-troubleshooting-and-debugging) - [Labels and Selectors](#labels-and-selectors-1) - [Deployments](#deployments) - [Deployments Commands](#deployments-commands) - [Services](#services) - [Ingress](#ingress) - [ReplicaSets](#replicasets) - [DaemonSet](#daemonset) - [DaemonSet - Commands](#daemonset---commands) - [StatefulSet](#statefulset) - [Storage](#storage) - [Volumes](#volumes) - [Networking](#networking) - [Network Policies](#network-policies) - [etcd](#etcd) - [Namespaces](#namespaces) - [Namespaces - commands](#namespaces---commands) - [Resources Quota](#resources-quota) - [Operators](#operators) - [Secrets](#secrets) - [Volumes](#volumes-1) - [Access Control](#access-control) - [Patterns](#patterns) - [CronJob](#cronjob) - [Misc](#misc) - [Gatekeeper](#gatekeeper) - [Policy Testing](#policy-testing) - [Helm](#helm) - [Commands](#commands) - [Security](#security) - [Troubleshooting Scenarios](#troubleshooting-scenarios) - [Istio](#istio) - [Controllers](#controllers) - [Scheduler](#scheduler-1) - [Node Affinity](#node-affinity) - [Taints](#taints) - [Resource Limits](#resource-limits) - [Resources Limits - Commands](#resources-limits---commands) - [Monitoring](#monitoring) - [Kustomize](#kustomize-1) - [Deployment Strategies](#deployment-strategies) - [Scenarios](#scenarios) ## Kubernetes Exercises ### Pods |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | My First Pod | Pods | [Exercise](pods_01.md) | [Solution](solutions/pods_01_solution.md) | "Killing" Containers | Pods | [Exercise](killing_containers.md) | [Solution](solutions/killing_containers.md) ### Service |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Creating a Service | Service | [Exercise](services_01.md) | [Solution](solutions/services_01_solution.md) ### ReplicaSet |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Creating a ReplicaSet | ReplicaSet | [Exercise](replicaset_01.md) | [Solution](solutions/replicaset_01_solution.md) | Operating ReplicaSets | ReplicaSet | [Exercise](replicaset_02.md) | [Solution](solutions/replicaset_02_solution.md) | ReplicaSets Selectors | ReplicaSet | [Exercise](replicaset_03.md) | [Solution](solutions/replicaset_03_solution.md) ### Labels and Selectors |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Labels and Selectors 101 | Labels, Selectors | [Exercise](exercises/labels_and_selectors/exercise.md) | [Solution](exercises/labels_and_selectors/solution.md) | Node Selectors | Labels, Selectors | [Exercise](exercises/node_selectors/exercise.md) | [Solution](exercises/node_selectors/solution.md) ### Scheduler |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Taints 101 | Taints | [Exercise](exercises/taints_101/exercise.md) | [Solution](exercises/taints_101/solution.md) ### Kustomize |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | common labels | Kustomize | [Exercise](exercises/kustomize_common_labels/exercise.md) | [Solution](exercises/kustomize_common_labels/solution.md) ## Kubernetes Questions ### Kubernetes 101
What is Kubernetes? Why organizations are using it?
Kubernetes is an open-source system that provides users with the ability to manage, scale and deploy containerized applications. To understand what Kubernetes is good for, let's look at some examples: * You would like to run a certain application in a container on multiple different locations and sync changes across all of them, no matter where they run * Performing updates and changes across hundreds of containers * Handle cases where the current load requires to scale up (or down)
When or why NOT to use Kubernetes?
- If you manage low level infrastructure or baremetals, Kubernetes is probably not what you need or want - If you are a small team (like less than 20 engineers) running less than a dozen of containers, Kubernetes might be an overkill (even if you need scale, rolling out updates, etc.). You might still enjoy the benefits of using managed Kubernetes, but you definitely want to think about it carefully before making a decision on whether to adopt it.
What are some of Kubernetes features?
- Self-Healing: Kubernetes uses health checks to monitor containers and run certain actions upon failure or other type of events, like restarting the container - Load Balancing: Kubernetes can split and/or balance requests to applications running in the cluster, based on the state of the Pods running the application - Operators: Kubernetes packaged applications that can use the API of the cluster to update its state and trigger actions based on events and application state changes - Automated Rollout: Gradual updates roll out to applications and support in roll back in case anything goes wrong - Scaling: Scaling horizontally (down and up) based on different state parameters and custom defined criteria - Secrets: you have a mechanism for storing user names, passwords and service endpoints in a private way, where not everyone using the cluster are able to view it
What Kubernetes objects are there?
* Pod * Service * ReplicationController * ReplicaSet * DaemonSet * Namespace * ConfigMap ...
What fields are mandatory with any Kubernetes object?
metadata, kind and apiVersion
What is kubectl?
Kubectl is the Kubernetes command line tool that allows you to run commands against Kubernetes clusters. For example, you can use kubectl to deploy applications, inspect and manage cluster resources, and view logs.
What Kubernetes objects do you usually use when deploying applications in Kubernetes?
* Deployment - creates the Pods () and watches them * Service: route traffic to Pods internally * Ingress: route traffic from outside the cluster
Why there is no such command in Kubernetes? kubectl get containers
Becaused container is not a Kubernetes object. The smallest object unit in Kubernetes is a Pod. In a single Pod you can find one or more containers.
What actions or operations you consider as best practices when it comes to Kubernetes?
- Always make sure Kubernetes YAML files are valid. Applying automated checks and pipelines is recommended. - Always specify requests and limits to prevent situation where containers are using the entire cluster memory which may lead to OOM issue - Specify labels to logically group Pods, Deployments, etc. Use labels to identify the type of the application for example, among other things
### Cluster and Architecture
What is a Kubernetes Cluster?
Red Hat Definition: "A Kubernetes cluster is a set of node machines for running containerized applications. If you’re running Kubernetes, you’re running a cluster. At a minimum, a cluster contains a worker node and a master node." Read more [here](https://www.redhat.com/en/topics/containers/what-is-a-kubernetes-cluster)
What is a Node?
A node is a virtual or a physical machine that serves as a worker for running the applications.
It's recommended to have at least 3 nodes in a production environment.
What the master node is responsible for?
The master coordinates all the workflows in the cluster: * Scheduling applications * Managing desired state * Rolling out new updates
Describe shortly and in high-level, what happens when you run kubectl get nodes
1. Your user is getting authenticated 2. Request is validated by the kube-apiserver 3. Data is retrieved from etcd
True or False? Every cluster must have 0 or more master nodes and at least 1 worker
False. A Kubernetes cluster consists of at least 1 master and can have 0 workers (although that wouldn't be very useful...)
What are the components of the master node (aka control plane)?
* API Server - the Kubernetes API. All cluster components communicate through it * Scheduler - assigns an application with a worker node it can run on * Controller Manager - cluster maintenance (replications, node failures, etc.) * etcd - stores cluster configuration
What are the components of a worker node (aka data plane)?
* Kubelet - an agent responsible for node communication with the master. * Kube-proxy - load balancing traffic between app components * Container runtime - the engine runs the containers (Podman, Docker, ...)
Place the components on the right side of the image in the right place in the drawing

You are managing multiple Kubernetes clusters. How do you quickly change between the clusters using kubectl?
`kubectl config use-context`
How do you prevent high memory usage in your Kubernetes cluster and possibly issues like memory leak and OOM?
Apply requests and limits, especially on third party applications (where the uncertainty is even bigger)
Do you have experience with deploying a Kubernetes cluster? If so, can you describe the process in high-level?
1. Create multiple instances you will use as Kubernetes nodes/workers. Create also an instance to act as the Master. The instances can be provisioned in a cloud or they can be virtual machines on bare metal hosts. 2. Provision a certificate authority that will be used to generate TLS certificates for the different components of a Kubernetes cluster (kubelet, etcd, ...) 1. Generate a certificate and private key for the different components 3. Generate kubeconfigs so the different clients of Kubernetes can locate the API servers and authenticate. 4. Generate encryption key that will be used for encrypting the cluster data 5. Create an etcd cluster
Which command will list all the object types in a cluster?
`kubectl api-resources`
What kubectl get componentstatus does?
Outputs the status of each of the control plane components.
#### Kubelet
What happens to running pods if if you stop Kubelet on the worker nodes?
When you stop the kubelet service on a worker node, it will no longer be able to communicate with the Kubernetes API server. As a result, the node will be marked as NotReady and the pods running on that node will be marked as Unknown. The Kubernetes control plane will then attempt to reschedule the pods to other available nodes in the cluster.
#### Nodes Commands
Run a command to view all nodes of the cluster
`kubectl get nodes` Note: You might want to create an alias (`alias k=kubectl`) and get used to `k get no`
Create a list of all nodes in JSON format and store it in a file called "some_nodes.json"
`k get nodes -o json > some_nodes.json`
Check what labels one of your nodes in the cluster has
`k get no minikube --show-labels`
### Pods
Explain what is a Pod
A Pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers. Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
Deploy a pod called "my-pod" using the nginx:alpine image
`kubectl run my-pod --image=nginx:alpine` If you are a Kubernetes beginner you should know that this is not a common way to run Pods. The common way is to run a Deployment which in turn runs Pod(s). In addition, Pods and/or Deployments are usually defined in files rather than executed directly using only the CLI arguments.
What are your thoughts on "Pods are not meant to be created directly"?
Pods are usually indeed not created directly. You'll notice that Pods are usually created as part of another entities such as Deployments or ReplicaSets. If a Pod dies, Kubernetes will not bring it back. This is why it's more useful for example to define ReplicaSets that will make sure that a given number of Pods will always run, even after a certain Pod dies.
How many containers can a pod contain?
A pod can include multiple containers but in most cases it would probably be one container per pod. There are some patterns where it makes to run more than one container like the "side-car" pattern where you might want to perform logging or some other operation that is executed by another container running with your app container in the same Pod.
What use cases exist for running multiple containers in a single pod?
A web application with separate (= in their own containers) logging and monitoring components/adapters is one examples.
A CI/CD pipeline (using Tekton for example) can run multiple containers in one Pod if a Task contains multiple commands.
What are the possible Pod phases?
* Running - The Pod bound to a node and at least one container is running * Failed/Error - At least one container in the Pod terminated with a failure * Succeeded - Every container in the Pod terminated with success * Unknown - Pod's state could not be obtained * Pending - Containers are not yet running (Perhaps images are still being downloaded or the pod wasn't scheduled yet)
True or False? By default, pods are isolated. This means they are unable to receive traffic from any source
False. By default, pods are non-isolated = pods accept traffic from any source.
True or False? The "Pending" phase means the Pod was not yet accepted by the Kubernetes cluster so the scheduler can't run it unless it's accepted
False. "Pending" is after the Pod was accepted by the cluster, but the container can't run for different reasons like images not yet downloaded.
True or False? A single Pod can be split across multiple nodes
False. A single Pod can run on a single node.
You run a pod and you see the status ContainerCreating
True or False? A volume defined in Pod can be accessed by all the containers of that Pod
True.
What happens when you run a Pod with kubectl?
1. Kubectl sends a request to the API server (kube-apiserver) to create the Pod 1. In the the process the user gets authenticated and the request is being validated. 2. etcd is being updated with the data 2. The Scheduler detects that there is an unassigned Pod by monitoring the API server (kube-apiserver) 3. The Scheduler chooses a node to assign the Pod to 1. etcd is being updated with the information 4. The Scheduler updates the API server about which node it chose 5. Kubelet (which also monitors the API server) notices there is a Pod assigned to the same node on which it runs and that Pod isn't running 6. Kubelet sends request to the container engine (e.g. Docker) to create and run the containers 7. An update is sent by Kubelet to the API server (notifying it that the Pod is running) 1. etcd is being updated by the API server again
How to confirm a container is running after running the command kubectl run web --image nginxinc/nginx-unprivileged
* When you run `kubectl describe pods ` it will tell whether the container is running: `Status: Running` * Run a command inside the container: `kubectl exec web -- ls`
After running kubectl run database --image mongo you see the status is "CrashLoopBackOff". What could possibly went wrong and what do you do to confirm?
"CrashLoopBackOff" means the Pod is starting, crashing, starting...and so it repeats itself.
There are many different reasons to get this error - lack of permissions, init-container misconfiguration, persistent volume connection issue, etc. One of the ways to check why it happened it to run `kubectl describe po ` and having a look at the exit code ``` Last State: Terminated Reason: Error Exit Code: 100 ``` Another way to check what's going on, is to run `kubectl logs `. This will provide us with the logs from the containers running in that Pod.
Explain the purpose of the following lines ``` livenessProbe: exec: command: - cat - /appStatus initialDelaySeconds: 10 periodSeconds: 5 ```
These lines make use of `liveness probe`. It's used to restart a container when it reaches a non-desired state.
In this case, if the command `cat /appStatus` fails, Kubernetes will kill the container and will apply the restart policy. The `initialDelaySeconds: 10` means that Kubelet will wait 10 seconds before running the command/probe for the first time. From that point on, it will run it every 5 seconds, as defined with `periodSeconds`
Explain the purpose of the following lines ``` readinessProbe: tcpSocket: port: 2017 initialDelaySeconds: 15 periodSeconds: 20 ```
They define a readiness probe where the Pod will not be marked as "Ready" before it will be possible to connect to port 2017 of the container. The first check/probe will start after 15 seconds from the moment the container started to run and will continue to run the check/probe every 20 seconds until it will manage to connect to the defined port.
What does the "ErrImagePull" status of a Pod means?
It wasn't able to pull the image specified for running the container(s). This can happen if the client didn't authenticated for example.
More details can be obtained with `kubectl describe po `.
What happens when you delete a Pod?
1. The `TERM` signal is sent to kill the main processes inside the containers of the given Pod 2. Each container is given a period of 30 seconds to shut down the processes gracefully 3. If the grace period expires, the `KILL` signal is used to kill the processes forcefully and the containers as well
Explain liveness probes
Liveness probes is a useful mechanism used for restarting the container when a certain check/probe, the user has defined, fails.
For example, the user can define that the command `cat /app/status` will run every X seconds and the moment this command fails, the container will be restarted. You can read more about it in [kubernetes.io](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes)
Explain readiness probes
readiness probes used by Kubelet to know when a container is ready to start running, accepting traffic.
For example, a readiness probe can be to connect port 8080 on a container. Once Kubelet manages to connect it, the Pod is marked as ready You can read more about it in [kubernetes.io](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes)
How readiness probe status affect Services when they are combined?
Only containers whose state set to Success will be able to receive requests sent to the Service.
Why it's common to have only one container per Pod in most cases?
One reason is that it makes it harder to scale when you need to scale only one of the containers in a given Pod.
True or False? Once a Pod is assisgned to a worker node, it will only run on that node, even if it fails at some point and spins up a new Pod
True.
True or False? Each Pod, when created, gets its own public IP address
False. Each Pod gets an IP address but an internal one and not publicly accessible. To make a Pod externally accessible, we need to use an object called Service in Kubernetes.
#### Static Pods
What are Static Pods?
[Kubernetes.io](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/): "Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them. Unlike Pods that are managed by the control plane (for example, a Deployment); instead, the kubelet watches each static Pod (and restarts it if it fails)."
True or False? The same as there are "Static Pods" there are other static resources like "deployments" and "replicasets"
False.
What are some use cases for using Static Pods?
One clear use case is running Control Plane Pods - running Pods such as kube-apiserver, scheduler, etc. These should run and operate regardless of whether some components of the cluster work or not and they should run on specific nodes of the cluster.
How to identify which Pods are Static Pods?
The suffix of the Pods is the same as the name of the nodes on which they are running TODO: check if it's always the case.
Which of the following is not a static pod?: * kube-scheduler * kube-proxy * kube-apiserver
kube-proxy - it's a DaemonSet (since it has to be presented on every node in the cluster). There is no one specific node on which it has to run.
Where static Pods manifests are located?
Most of the time it's in /etc/kubernetes/manifests but you can verify with `grep -i static /var/lib/kubelet/config.yaml` to locate the value of `statisPodsPath`. It might be that your config is in different path. To verify run `ps -ef | grep kubelet` and see what is the value of --config argument of the process `/usr/bin/kubelet` The key itself for defining the path of static Pods is `staticPodPath`. So if your config is in `/var/lib/kubelet/config.yaml` you can run `grep staticPodPath /var/lib/kubelet/config.yaml`.
Describe how would you delete a static Pod
Locate the static Pods directory (look at `staticPodPath` in kubelet configuration file). Go to that directory and remove the manifest/definition of the staic Pod (`rm /`)
#### Pods Commands
How to check to which worker node the pods were scheduled to? In other words, how to check on which node a certain Pod is running?
`kubectl get pods -o wide`
How to delete a pod?
`kubectl delete pod pod_name`
List all the pods with the label "env=prod"
`k get po -l env=prod` To count them: `k get po -l env=prod --no-headers | wc -l`
How to list the pods in the current namespace?
`kubectl get po`
How view all the pods running in all the namespaces?
`kubectl get pods --all-namespaces`
#### Pods Troubleshooting and Debugging
You try to run a Pod but it's in "Pending" state. What might be the reason?
One possible reason is that the scheduler which supposed to schedule Pods on nodes, is not running. To verify it, you can run `kubectl get po -A | grep scheduler` or check directly in `kube-system` namespace.
What kubectl logs [pod-name] command does?
Prints the logs for a container in a pod.
What kubectl describe pod [pod name] does? command does?
Show details of a specific resource or group of resources.
Create a static pod with the image python that runs the command sleep 2017
First change to the directory tracked by kubelet for creating static pod: `cd /etc/kubernetes/manifests` (you can verify path by reading kubelet conf file) Now create the definition/manifest in that directory `k run some-pod --image=python --command sleep 2017 --restart=Never --dry-run=client -o yaml > statuc-pod.yaml`
### Labels and Selectors
Explain Labels
[Kubernetes.io](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/): "Labels are key/value pairs that are attached to objects, such as pods. Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of objects. Labels can be attached to objects at creation time and subsequently added and modified at any time. Each object can have a set of key/value labels defined. Each Key must be unique for a given object."
Explain selectors
[Kubernetes.io](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors): "Unlike names and UIDs, labels do not provide uniqueness. In general, we expect many objects to carry the same label(s). Via a label selector, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes. The API currently supports two types of selectors: equality-based and set-based. A label selector can be made of multiple requirements which are comma-separated. In the case of multiple requirements, all must be satisfied so the comma separator acts as a logical AND (&&) operator."
Provide some actual examples of how labels are used
* Can be used by the scheduler to place certain Pods (with certain labels) on specific nodes * Used by replicasets to track pods which have to be scaled
What are Annotations?
[Kubernetes.io](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/): "You can use Kubernetes annotations to attach arbitrary non-identifying metadata to objects. Clients such as tools and libraries can retrieve this metadata."
How annotations different from labels?
[Kuberenets.io](Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels.): "Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels."
How to view the logs of a container running in a Pod?
`k logs POD_NAME`
There are two containers inside a Pod called "some-pod". What will happen if you run kubectl logs some-pod
It won't work because there are two containers inside the Pod and you need to specify one of them with `kubectl logs POD_NAME -c CONTAINER_NAME`
### Deployments
What is a "Deployment" in Kubernetes?
A Kubernetes Deployment is used to tell Kubernetes how to create or modify instances of the pods that hold a containerized application. Deployments can scale the number of replica pods, enable rollout of updated code in a controlled manner, or roll back to an earlier deployment version if necessary. A Deployment is a declarative statement for the desired state for Pods and Replica Sets.
How to create a deployment with the image "nginx:alpine"?
`kubectl create deployment my-first-deployment --image=nginx:alpine` OR ``` cat << EOF | kubectl create -f - apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:alpine ```
How to verify a deployment was created?
`kubectl get deployments` or `kubectl get deploy` This command lists all the Deployment objects created and exist in the cluster. It doesn't mean the deployments are readt and running. This can be checked with the "READY" and "AVAILABLE" columns.
How to edit a deployment?
`kubectl edit deployment `
What happens after you edit a deployment and change the image?
The pod will terminate and another, new pod, will be created. Also, when looking at the replicaset, you'll see the old replica doesn't have any pods and a new replicaset is created.
How to delete a deployment?
One way is by specifying the deployment name: `kubectl delete deployment [deployment_name]` Another way is using the deployment configuration file: `kubectl delete -f deployment.yaml`
What happens when you delete a deployment?
The pod related to the deployment will terminate and the replicaset will be removed.
What happens behind the scenes when you create a Deployment object?
The following occurs when you run `kubectl create deployment some_deployment --image=nginx` 1. HTTP request sent to kubernetes API server on the cluster to create a new deployment 2. A new Pod object is created and scheduled to one of the workers nodes 3. Kublet on the worker node notices the new Pod and instructs the Container runtime engine to pull the image from the registry 4. A new container is created using the image that was just pulled
How make an app accessible on private or external network?
Using a Service.
Can you use a Deployment for stateful applications?
Fix the following deployment manifest ```yaml apiVersion: apps/v1 kind: Deploy metadata: creationTimestamp: null labels: app: dep name: dep spec: replicas: 3 selector: matchLabels: app: dep strategy: {} template: metadata: creationTimestamp: null labels: app: dep spec: containers: - image: redis name: redis resources: {} status: {} ```
Change `kind: Deploy` to `kind: Deployment`
Fix the following deployment manifest ```yaml apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: dep name: dep spec: replicas: 3 selector: matchLabels: app: depdep strategy: {} template: metadata: creationTimestamp: null labels: app: dep spec: containers: - image: redis name: redis resources: {} status: {} ```
The selector doesn't match the label (dep vs depdep). To solve it, fix depdep so it's dep instead.
#### Deployments Commands
Create a file definition/manifest of a deployment called "dep", with 3 replicas that uses the image 'redis'
`k create deploy dep -o yaml --image=redis --dry-run=client --replicas 3 > deployment.yaml `
Delete the deployment `depdep`
`k delete deploy depdep`
Create a deployment called "pluck" using the image "redis" and make sure it runs 5 replicas
`kubectl create deployment pluck --image=redis` `kubectl scale deployment pluck --replicas=5`
Create a deployment with the following properties: * called "blufer" * using the image "python" * runs 3 replicas * all pods will be placed on a node that has the label "blufer"
`kubectl create deployment blufer --image=python --replicas=3 -o yaml --dry-run=client > deployment.yaml` Add the following section (`vi deployment.yaml`): ``` spec: affinity: nodeAffinity: requiredDuringSchedlingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: blufer operator: Exists ``` `kubectl apply -f deployment.yaml`
### Services
What is a Service in Kubernetes?
"An abstract way to expose an application running on a set of Pods as a network service." - read more [here](https://kubernetes.io/docs/concepts/services-networking/service) In simpler words, it allows you to add an internal or external connectivity to a certain application running in a container.
Place the components in the right placeholders in regards to Kubernetes service

How to create a service for an existing deployment called "alle" on port 8080 so the Pod(s) accessible via a Load Balancer?
The imperative way: `kubectl expose deployment alle --type=LoadBalancer --port 8080`
True or False? The lifecycle of Pods and Services isn't connected so when a Pod dies, the Service still stays
True
After creating a service, how to check it was created?
`kubectl get svc`
What's the default Service type?
ClusterIP - used for internal communication.
What Service types are there?
* ClusterIP * NodePort * LoadBalancer * ExternalName More on this topic [here](https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
How Service and Deployment are connected?
The truth is they aren't connected. Service points to Pod(s) directly, without connecting to the Deployment in any way.
What are important steps in defining/adding a Service?
1. Making sure that targetPort of the Service is matching the containerPort of the Pod 2. Making sure that selector matches at least one of the Pod's labels
What is the default service type in Kubernetes and what is it used for?
The default is ClusterIP and it's used for exposing a port internally. It's useful when you want to enable internal communication between Pods and prevent any external access.
How to get information on a certain service?
`kubctl describe service ` It's more common to use `kubectl describe svc ...`
What the following command does? ``` kubectl expose rs some-replicaset --name=replicaset-svc --target-port=2017 --type=NodePort ```
It exposes a ReplicaSet by creating a service called 'replicaset-svc'. The exposed port is 2017 (this is the port used by the application) and the service type is NodePort which means it will be reachable externally.
True or False? the target port, in the case of running the following command, will be exposed only on one of the Kubernetes cluster nodes but it will routed to all the pods ``` kubectl expose rs some-replicaset --name=replicaset-svc --target-port=2017 --type=NodePort ```
False. It will be exposed on every node of the cluster and will be routed to one of the Pods (which belong to the ReplicaSet)
How to verify that a certain service configured to forward the requests to a given pod
Run `kubectl describe service` and see if the IPs from "Endpoints" match any IPs from the output of `kubectl get pod -o wide`
Explain what will happen when running apply on the following block ``` apiVersion: v1 kind: Service metadata: name: some-app spec: type: NodePort ports: - port: 8080 nodePort: 2017 protocol: TCP selector: type: backend service: some-app ```
It creates a new Service of the type "NodePort" which means it can be used for internal and external communication with the app.
The port of the application is 8080 and the requests will forwarded to this port. The exposed port is 2017. As a note, this is not a common practice, to specify the nodePort.
The port used TCP (instead of UDP) and this is also the default so you don't have to specify it.
The selector used by the Service to know to which Pods to forward the requests. In this case, Pods with the label "type: backend" and "service: some-app".
How to turn the following service into an external one? ``` spec: selector: app: some-app ports: - protocol: TCP port: 8081 targetPort: 8081 ```
Adding `type: LoadBalancer` and `nodePort` ``` spec: selector: app: some-app type: LoadBalancer ports: - protocol: TCP port: 8081 targetPort: 8081 nodePort: 32412 ```
What would you use to route traffic from outside the Kubernetes cluster to services within a cluster?
Ingress
True or False? When "NodePort" is used, "ClusterIP" will be created automatically?
True
When would you use the "LoadBalancer" type
Mostly when you would like to combine it with cloud provider's load balancer
How would you map a service to an external address?
Using the 'ExternalName' directive.
Describe in detail what happens when you create a service
1. Kubectl sends a request to the API server to create a Service 2. The controller detects there is a new Service 3. Endpoint objects created with the same name as the service, by the controller 4. The controller is using the Service selector to identify the endpoints 5. kube-proxy detects there is a new endpoint object + new service and adds iptables rules to capture traffic to the Service port and redirect it to endpoints 6. kube-dns detects there is a new Service and adds the container record to the dns server
How to list the endpoints of a certain app?
`kubectl get ep `
How can you find out information on a Service related to a certain Pod if all you can use is kubectl exec --
You can run `kubectl exec -- env` which will give you a couple environment variables related to the Service.
Variables such as `[SERVICE_NAME]_SERVICE_HOST`, `[SERVICE_NAME]_SERVICE_PORT`, ...
Describe what happens when a container tries to connect with its corresponding Service for the first time. Explain who added each of the components you include in your description
- The container looks at the nameserver defined in /etc/resolv.conf - The container queries the nameserver so the address is resolved to the Service IP - Requests sent to the Service IP are forwarded with iptables rules (or other chosen software) to the endpoint(s). Explanation as to who added them: - The nameserver in the container is added by kubelet during the scheduling of the Pod, by using kube-dns - The DNS record of the service is added by kube-dns during the Service creation - iptables rules are added by kube-proxy during Endpoint and Service creation
Describe in high level what happens when you run kubctl expose deployment remo --type=LoadBalancer --port 8080
1. Kubectl sends a request to Kubernetes API to create a Service object 2. Kubernetes asks the cloud provider (e.g. AWS, GCP, Azure) to provision a load balancer 3. The newly created load balancer forwards incoming traffic to relevant worker node(s) which forwards the traffic to the relevant containers
After creating a service that forwards incoming external traffic to the containerized application, how to make sure it works?
You can run `curl :` to examine the output.
An internal load balancer in Kubernetes is called ____ and an external load balancer is called ____
An internal load balancer in Kubernetes is called Service and an external load balancer is Ingress
### Ingress
What is Ingress?
From Kubernetes docs: "Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource." Read more [here](https://kubernetes.io/docs/concepts/services-networking/ingress/)
Complete the following configuration file to make it Ingress ``` metadata: name: someapp-ingress spec: ```
There are several ways to answer this question. ``` apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: someapp-ingress spec: rules: - host: my.host http: paths: - backend: serviceName: someapp-internal-service servicePort: 8080 ```
Explain the meaning of "http", "host" and "backend" directives ``` apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: someapp-ingress spec: rules: - host: my.host http: paths: - backend: serviceName: someapp-internal-service servicePort: 8080 ```
host is the entry point of the cluster so basically a valid domain address that maps to cluster's node IP address
the http line used for specifying that incoming requests will be forwarded to the internal service using http.
backend is referencing the internal service (serviceName is the name under metadata and servicePort is the port under the ports section).
Why using a wildcard in ingress host may lead to issues?
The reason you should not wildcard value in a host (like `- host: *`) is because you basically tell your Kubernetes cluster to forward all the traffic to the container where you used this ingress. This may cause the entire cluster to go down.
What is Ingress Controller?
An implementation for Ingress. It's basically another pod (or set of pods) that does evaluates and processes Ingress rules and this it manages all the redirections. There are multiple Ingress Controller implementations (the one from Kubernetes is Kubernetes Nginx Ingress Controller).
What are some use cases for using Ingress?
* Multiple sub-domains (multiple host entries, each with its own service) * One domain with multiple services (multiple paths where each one is mapped to a different service/application)
How to list Ingress in your namespace?
kubectl get ingress
What is Ingress Default Backend?
It specifies what do with an incoming request to the Kubernetes cluster that isn't mapped to any backend (= no rule to for mapping the request to a service). If the default backend service isn't defined, it's recommended to define so users still see some kind of message instead of nothing or unclear error.
How to configure a default backend?
Create Service resource that specifies the name of the default backend as reflected in `kubectl describe ingress ...` and the port under the ports section.
How to configure TLS with Ingress?
Add tls and secretName entries. ``` spec: tls: - hosts: - some_app.com secretName: someapp-secret-tls ```
True or False? When configuring Ingress with TLS, the Secret component must be in the same namespace as the Ingress component
True
Which Kubernetes concept would you use to control traffic flow at the IP address or port level?
Network Policies
How to scale an application (deplyoment) so it runs more than one instance of the application?
To run two instances of the applicaation? `kubectl scale deployment --replicas=2` You can specify any other number, given that your application knows how to scale.
### ReplicaSets
What is the purpose of ReplicaSet?
[kubernetes.io](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset): "A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods." In simpler words, a ReplicaSet will ensure the specified number of Pods replicas is running for a selected Pod. If there are more Pods than defined in the ReplicaSet, some will be removed. If there are less than what is defined in the ReplicaSet then, then more replicas will be added.
What the following block of lines does? ``` spec: replicas: 2 selector: matchLabels: type: backend template: metadata: labels: type: backend spec: containers: - name: httpd-yup image: httpd ```
It defines a replicaset for Pods whose type is set to "backend" so at any given point of time there will be 2 concurrent Pods running.
What will happen when a Pod, created by ReplicaSet, is deleted directly with kubectl delete po ...?
The ReplicaSet will create a new Pod in order to reach the desired number of replicas.
True or False? If a ReplicaSet defines 2 replicas but there 3 Pods running matching the ReplicaSet selector, it will do nothing
False. It will terminate one of the Pods to reach the desired state of 2 replicas.
Describe the sequence of events in case of creating a ReplicaSet
* The client (e.g. kubectl) sends a request to the API server to create a ReplicaSet * The Controller detects there is a new event requesting for a ReplicaSet * The controller creates new Pod definitions (the exact number depends on what is defined in the ReplicaSet definition) * The scheduler detects unassigned Pods and decides to which nodes to assign the Pods. This information sent to the API server * Kubelet detects that two Pods were assigned to the node it's running on (as it constantly watching the API server) * Kubelet sends requests to the container engine, to create the containers that are part of the Pod * Kubelet sends a request to the API server to notify it the Pods were created
How to list ReplicaSets in the current namespace?
`kubectl get rs`
Is it possible to delete ReplicaSet without deleting the Pods it created?
Yes, with `--cascase=false`. `kubectl delete -f rs.yaml --cascade=false`
What is the default number of replicas if not explicitly specified?
1
What the following output of kubectl get rs means? NAME DESIRED CURRENT READY AGE web 2 2 0 2m23s
The replicaset `web` has 2 replicas. It seems that the containers inside the Pod(s) are not yet running since the value of READY is 0. It might be normal since it takes time for some containers to start running and it might be due to an error. Running `kubectl describe po POD_NAME` or `kubectl logs POD_NAME` can give us more information.
True or False? Pods specified by the selector field of ReplicaSet must be created by the ReplicaSet itself
False. The Pods can be already running and initially they can be created by any object. It doesn't matter for the ReplicaSet and not a requirement for it to acquire and monitor them.
True or False? In case of a ReplicaSet, if Pods specified in the selector field don't exists, the ReplicaSet will wait for them to run before doing anything
False. It will take care of running the missing Pods.
In case of a ReplicaSet, Which field is mandatory in the spec section?
The field `template` in spec section is mandatory. It's used by the ReplicaSet to create new Pods when needed.
You've created a ReplicaSet, how to check whether the ReplicaSet found matching Pods or it created new Pods?
`kubectl describe rs ` It will be visible under `Events` (the very last lines)
True or False? Deleting a ReplicaSet will delete the Pods it created
True (and not only the Pods but anything else it created).
True or False? Removing the label from a Pod that is tracked by a ReplicaSet, will cause the ReplicaSet to create a new Pod
True. When the label, used by a ReplicaSet in the selector field, removed from a Pod, that Pod no longer controlled by the ReplicaSet and the ReplicaSet will create a new Pod to compensate for the one it "lost".
How to scale a deployment to 8 replicas?
kubectl scale deploy --replicas=8
ReplicaSets are running the moment the user executed the command to create them (like kubectl create -f rs.yaml)
False. It can take some time, depends on what exactly you are running. To see if they are up and running, run `kubectl get rs` and watch the 'READY' column.
How to expose a ReplicaSet as a new service?
`kubectl expose rs --name= --target-port= --type=NodePort` Few notes: - the target port depends on which port the app is using in the container - type can be different and doesn't has to be specifically "NodePort"
Fix the following ReplicaSet definition ```yaml apiVersion: apps/v1 kind: ReplicaCet metadata: name: redis labels: app: redis tier: cache spec: selector: matchLabels: tier: cache template: metadata: labels: tier: cachy spec: containers: - name: redis image: redis ```
kind should be ReplicaSet and not ReplicaCet :)
Fix the following ReplicaSet definition ```yaml apiVersion: apps/v1 kind: ReplicaSet metadata: name: redis labels: app: redis tier: cache spec: selector: matchLabels: tier: cache template: metadata: labels: tier: cachy spec: containers: - name: redis image: redis ```
The selector doesn't match the label (cache vs cachy). To solve it, fix cachy so it's cache instead.
How to check which container image was used as part of replica set called "repli"?
`k describe rs repli | grep -i image`
How to check how many Pods are ready as part of a replica set called "repli"?
`k describe rs repli | grep -i "Pods Status"`
How to delete a replica set called "rori"?
`k delete rs rori`
How to modify a replica set called "rori" to use a different image?
`k edis rs rori`
Scale up a replica set called "rori" to run 5 Pods instead of 2
`k scale rs rori --replicas=5`
Scale down a replica set called "rori" to run 1 Pod instead of 5
`k scale rs rori --replicas=1`
### DaemonSet
What's a DaemonSet?
[Kubernetes.io](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset): "A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created."
What's the difference between a ReplicaSet and DaemonSet?
A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. A DaemonSet ensures that all Nodes run a copy of a Pod.
What are some use cases for using a DaemonSet?
* Monitoring: You would like to perform monitoring on every node part of cluster. For example datadog pod runs on every node using a daemonset * Logging: You would like to having logging set up on every node part of your cluster * Networking: there is networking component you need on every node for all nodes to communicate between them
How DaemonSet works?
Historically, up 1.12, it was done with NodeName attribute. Starting 1.12, it's achieved with regular scheduler and node affinity.
#### DaemonSet - Commands
How to list all daemonsets in the current namespace?
`kubectl get ds`
### StatefulSet
Explain StatefulSet
StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.[Learn more](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
### Storage #### Volumes
What is a volume in regards to Kubernetes?
A directory accessible by the containers inside a certain Pod and containers. The mechanism responsible for creating the directory, managing it, ... mainly depends on the volume type.
What volume types are you familiar with?
* emptyDir: created when a Pod assigned to a node and ceases to exist when the Pod is no longer running on that node * hostPath: mounts a path from the host itself. Usually not used due to security risks but has multiple use-cases where it's needed like access to some internal host paths (`/sys`, `/var/lib`, etc.)
Which problems, volumes in Kubernetes solve?
1. Sharing files between containers running in the same Pod 2. Storage in containers is ephemeral - it usually doesn't last for long. For example, when a container crashes, you lose all on-disk data. Certain volumes allows to manage such situation by persistent volumes
Explain ephemeral volume types vs. persistent volumes in regards to Pods
Ephemeral volume types have the lifetime of a pod as opposed to persistent volumes which exist beyond the lifetime of a Pod.
Provide at least one use-case for each of the following volume types: * emptyDir * hostPath
* EmptyDir: You need a temporary data that you can afford to lose if the Pod is deleted. For example short-lived data required for one-time operations. * hostPath: You need access to paths on the host itself (like data from `/sys` or data generated in `/var/lib`)
### Networking
True or False? By default there is no communication between two Pods in two different namespaces
False. By default two Pods in two different namespaces are able to communicate with each other. Try it for yourself: kubectl run test-prod -n prod --image ubuntu -- sleep 2000000000 kubectl run test-dev -n dev --image ubuntu -- sleep 2000000000 `k describe po test-prod -n prod` to get the IP of test-prod Pod. Access dev Pod: `kubectl exec --stdin --tty test-dev -n dev -- /bin/bash` And ping the IP of test-prod Pod you get earlier.You'll see that there is communication between the two pods, in two separate namespaces.
### Network Policies
Explain Network Policies
[kubernetes.io](https://kubernetes.io/docs/concepts/services-networking/network-policies): "NetworkPolicies are an application-centric construct which allow you to specify how a pod is allowed to communicate with various network "entities"..." In simpler words, Network Policies specify how pods are allowed/disallowed to communicate with each other and/or other network endpoints.
What are some use cases for using Network Policies?
- Security: You want to prevent from everyone to communicate with a certain pod for security reasons - Controlling network traffic: You would like to deny network flow between two specific nodes
True or False? If no network policies are applied to a pod, then no connections to or from it are allowed
False. By default pods are non-isolated.
In case of two pods, if there is an egress policy on the source denining traffic and ingress policy on the destination that allows traffic then, traffic will be allowed or denied?
Denied. Both source and destination policies has to allow traffic for it to be allowed.
Where Kubernetes cluster stores the cluster state?
etcd
### etcd
What is etcd?
etcd is an open source distributed key-value store used to hold and manage the critical information that distributed systems need to keep running. [Read more here](https://www.redhat.com/en/topics/containers/what-is-etcd)
True or False? Etcd holds the current status of any kubernetes component
True
True or False? The API server is the only component which communicates directly with etcd
True
True or False? application data is not stored in etcd
True
Why etcd? Why not some SQL or NoSQL database?
When chosen as the data store etcd was (and still is of course): * Highly Available - you can deploy multiple nodes * Fully Replicated - any node in etcd cluster is "primary" node and has full access to the data * Consistent - reads return latest data * Secured - supports both TLS and SSL * Speed - high performance data store (10k writes per sec!)
### Namespaces
What are namespaces?
Namespaces allow you split your cluster into virtual clusters where you can group your applications in a way that makes sense and is completely separated from the other groups (so you can for example create an app with the same name in two different namespaces)
Why to use namespaces? What is the problem with using one default namespace?
When using the default namespace alone, it becomes hard over time to get an overview of all the applications you manage in your cluster. Namespaces make it easier to organize the applications into groups that makes sense, like a namespace of all the monitoring applications and a namespace for all the security applications, etc. Namespaces can also be useful for managing Blue/Green environments where each namespace can include a different version of an app and also share resources that are in other namespaces (namespaces like logging, monitoring, etc.). Another use case for namespaces is one cluster, multiple teams. When multiple teams use the same cluster, they might end up stepping on each others toes. For example if they end up creating an app with the same name it means one of the teams overridden the app of the other team because there can't be too apps in Kubernetes with the same name (in the same namespace).
True or False? When a namespace is deleted all resources in that namespace are not deleted but moved to another default namespace
False. When a namespace is deleted, the resources in that namespace are deleted as well.
What special namespaces are there by default when creating a Kubernetes cluster?
* default * kube-system * kube-public * kube-node-lease
What can you find in kube-system namespace?
* Master and Kubectl processes * System processes
While namespaces do provide scope for resources, they are not isolating them
True. Try create two pods in two separate namespaces for example, and you'll see there is a connection between the two.
#### Namespaces - commands
How to list all namespaces?
`kubectl get namespaces` OR `kubectl get ns`
Create a namespace called 'alle'
`k create ns alle`
Check how many namespaces are there
`k get ns --no-headers | wc -l`
Check how many pods exist in the "dev" namespace
`k get po -n dev`
Create a pod called "kartos" in the namespace dev. The pod should be using the "redis" image.
If the namespace doesn't exist already: `k create ns dev` `k run kratos --image=redis -n dev`
You are looking for a Pod called "atreus". How to check in which namespace it runs?
`k get po -A | grep atreus`
What kube-public contains?
* A configmap, which contains cluster information * Publicly accessible data
How to get the name of the current namespace?
`kubectl config view | grep namespace`
What kube-node-lease contains?
It holds information on heartbeats of nodes. Each node gets an object which holds information about its availability.
True or False? With namespaces you can limit the resources consumed by the users/teams
True. With namespaces you can limit CPU, RAM and storage usage.
How to switch to another namespace? In other words how to change active namespace?
`kubectl config set-context --current --namespace=some-namespace` and validate with `kubectl config view --minify | grep namespace:` OR `kubens some-namespace`
#### Resources Quota
What is Resource Quota?
Resource quota provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that namespace.
How to create a Resource Quota?
kubectl create quota some-quota --hard-cpu=2,pods=2
Which resources are accessible from different namespaces?
Services.
Which service and in which namespace the following file is referencing? ``` apiVersion: v1 kind: ConfigMap metadata: name: some-configmap data: some_url: samurai.jack ```
It's referencing the service "samurai" in the namespace called "jack".
Which components can't be created within a namespace?
Volume and Node.
How to list all the components that bound to a namespace?
`kubectl api-resources --namespaced=true`
How to create components in a namespace?
One way is by specifying --namespace like this: `kubectl apply -f my_component.yaml --namespace=some-namespace` Another way is by specifying it in the YAML itself: ``` apiVersion: v1 kind: ConfigMap metadata: name: some-configmap namespace: some-namespace ``` and you can verify with: `kubectl get configmap -n some-namespace`
How to execute the command "ls" in an existing pod?
kubectl exec some-pod -it -- ls
How to create a service that exposes a deployment?
kubectl expose deploy some-deployment --port=80 --target-port=8080
How to create a pod and a service with one command?
kubectl run nginx --image=nginx --restart=Never --port 80 --expose
Describe in detail what the following command does kubectl create deployment kubernetes-httpd --image=httpd
Why to create kind deployment, if pods can be launched with replicaset?
How to get list of resources which are not bound to a specific namespace?
kubectl api-resources --namespaced=false
How to delete all pods whose status is not "Running"?
kubectl delete pods --field-selector=status.phase!='Running'
How to display the resources usages of pods?
kubectl top pod
Perhaps a general question but, you suspect one of the pods is having issues, you don't know what exactly. What do you do?
Start by inspecting the pods status. we can use the command `kubectl get pods` (--all-namespaces for pods in system namespace)
If we see "Error" status, we can keep debugging by running the command `kubectl describe pod [name]`. In case we still don't see anything useful we can try stern for log tailing.
In case we find out there was a temporary issue with the pod or the system, we can try restarting the pod with the following `kubectl scale deployment [name] --replicas=0`
Setting the replicas to 0 will shut down the process. Now start it with `kubectl scale deployment [name] --replicas=1`
What happens what pods are using too much memory? (more than its limit)
They become candidates to for termination.
Describe how roll-back works
True or False? Memory is a compressible resource, meaning that when a container reach the memory limit, it will keep running
False. CPU is a compressible resource while memory is a non compressible resource - once a container reached the memory limit, it will be terminated.
### Operators
What is an Operator?
Explained [here](https://kubernetes.io/docs/concepts/extend-kubernetes/operator) "Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop." In simpler words, you can think about an operator as a custom control loop in Kubernetes.
Why do we need Operators?
The process of managing stateful applications in Kubernetes isn't as straightforward as managing stateless applications where reaching the desired status and upgrades are both handled the same way for every replica. In stateful applications, upgrading each replica might require different handling due to the stateful nature of the app, each replica might be in a different status. As a result, we often need a human operator to manage stateful applications. Kubernetes Operator is suppose to assist with this. This also help with automating a standard process on multiple Kubernetes clusters
What components the Operator consists of?
1. CRD (Custom Resource Definition) - You are fanmiliar with Kubernetes resources like Deployment, Pod, Service, etc. CRD is also a resource, but one that you or the developer the operator defines. 2. Controller - Custom control loop which runs against the CRD
Explain CRD
CRD is Custom Resource Definitions. It's custom Kubernetes component which extends K8s API. TODO(abregman): add more info.
How Operator works?
It uses the control loop used by Kubernetes in general. It watches for changes in the application state. The difference is that is uses a custom control loop. In addition, it also makes use of CRD's (Custom Resources Definitions) so basically it extends Kubernetes API.
True or False? Kubernetes Operator used for stateful applications
True
Explain what is the OLM (Operator Lifecycle Manager) and what is it used for
What is the Operator Framework?
open source toolkit used to manage k8s native applications, called operators, in an automated and efficient way.
What components the Operator Framework consists of?
1. Operator SDK - allows developers to build operators 2. Operator Lifecycle Manager - helps to install, update and generally manage the lifecycle of all operators 3. Operator Metering - Enables usage reporting for operators that provide specialized services 4.
Describe in detail what is the Operator Lifecycle Manager
It's part of the Operator Framework, used for managing the lifecycle of operators. It basically extends Kubernetes so a user can use a declarative way to manage operators (installation, upgrade, ...).
What openshift-operator-lifecycle-manager namespace includes?
It includes: * catalog-operator - Resolving and installing ClusterServiceVersions the resource they specify. * olm-operator - Deploys applications defined by ClusterServiceVersion resource
What is kubconfig? What do you use it for?
A kubeconfig file is a file used to configure access to Kubernetes when used in conjunction with the kubectl commandline tool (or other clients). Use kubeconfig files to organize information about clusters, users, namespaces, and authentication mechanisms.
Would you use Helm, Go or something else for creating an Operator?
Depends on the scope and maturity of the Operator. If it mainly covers installation and upgrades, Helm might be enough. If you want to go for Lifecycle management, insights and auto-pilot, this is where you'd probably use Go.
Are there any tools, projects you are using for building Operators?
This one is based more on a personal experience and taste... * Operator Framework * Kubebuilder * Controller Runtime ...
### Secrets
Explain Kubernetes Secrets
Secrets let you store and manage sensitive information (passwords, ssh keys, etc.)
How to create a Secret from a key and value?
`kubectl create secret generic some-secret --from-literal=password='donttellmypassword'`
How to create a Secret from a file?
`kubectl create secret generic some-secret --from-file=/some/file.txt`
What type: Opaque in a secret file means? What other types are there?
Opaque is the default type used for key-value pairs.
True or False? storing data in a Secret component makes it automatically secured
False. Some known security mechanisms like "encryption" aren't enabled by default.
What is the problem with the following Secret file: ``` apiVersion: v1 kind: Secret metadata: name: some-secret type: Opaque data: password: mySecretPassword ```
Password isn't encrypted. You should run something like this: `echo -n 'mySecretPassword' | base64` and paste the result to the file instead of using plain-text.
What the following in a Deployment configuration file means? ``` spec: containers: - name: USER_PASSWORD valueFrom: secretKeyRef: name: some-secret key: password ```
USER_PASSWORD environment variable will store the value from password key in the secret called "some-secret" In other words, you reference a value from a Kubernetes Secret.
How to commit secrets to Git and in general how to use encrypted secrets?
One possible process would be as follows: 1. You create a Kubernetes secret (but don't commit it) 2. You encrypt it using some 3rd party project (.e.g kubeseal) 3. You apply the seald/encrypted secret 4. You commit the the sealed secret to Git 5. You deploy an application that requires the secret and it can be automatically decrypted by using for example a Bitnami Sealed secrets controller
### Volumes
True or False? Kubernetes provides data persistence out of the box, so when you restart a pod, data is saved
False
Explain "Persistent Volumes". Why do we need it?
Persistent Volumes allow us to save data so basically they provide storage that doesn't depend on the pod lifecycle.
True or False? Persistent Volume must be available to all nodes because the pod can restart on any of them
True
What types of persistent volumes are there?
* NFS * iSCSI * CephFS * ...
What is PersistentVolumeClaim?
Explain Volume Snapshots
Volume snapshots let you create a copy of your volume at a specific point in time.
True or False? Kubernetes manages data persistence
False
Explain Storage Classes
Explain "Dynamic Provisioning" and "Static Provisioning"
The main difference relies on the moment when you want to configure storage. For instance, if you need to pre-populate data in a volume, you choose static provisioning. Whereas, if you need to create volumes on demand, you go for dynamic provisioning.
Explain Access Modes
What is CSI Volume Cloning?
Explain "Ephemeral Volumes"
What types of ephemeral volumes Kubernetes supports?
What is Reclaim Policy?
What reclaim policies are there?
* Retain * Recycle * Delete
### Access Control
What is RBAC?
RBAC in Kubernetes is the mechanism that enables you to configure fine-grained and specific sets of permissions that define how a given user, or group of users, can interact with any Kubernetes object in cluster, or in a specific Namespace of cluster.
Explain the Role and RoleBinding" objects
What is the difference between Role and ClusterRole objects?
The difference between them is that a Role is used at a namespace level whereas a ClusterRole is for the entire cluster.
Explain what are "Service Accounts" and in which scenario would use create/use one
[Kubernetes.io](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account): "A service account provides an identity for processes that run in a Pod." An example of when to use one: You define a pipeline that needs to build and push an image. In order to have sufficient permissions to build an push an image, that pipeline would require a service account with sufficient permissions.
What happens you create a pod and you DON'T specify a service account?
The pod is automatically assigned with the default service account (in the namespace where the pod is running).
Explain how Service Accounts are different from User Accounts
- User accounts are global while Service accounts unique per namespace - User accounts are meant for humans or client processes while Service accounts are for processes which run in pods
How to list Service Accounts?
`kubectl get serviceaccounts`
Explain "Security Context"
[kubernetes.io](https://kubernetes.io/docs/tasks/configure-pod-container/security-context): "A security context defines privilege and access control settings for a Pod or Container."
### Patterns ### CronJob
Explain what is CronJob and what is it used for
A CronJob creates Jobs on a repeating schedule. One CronJob object is like one line of a crontab (cron table) file. It runs a job periodically on a given schedule, written in Cron format.
What possible issue can arise from using the following spec and how to fix it? ``` apiVersion: batch/v1beta1 kind: CronJob metadata: name: some-cron-job spec: schedule: '*/1 * * * *' startingDeadlineSeconds: 10 concurrencyPolicy: Allow ```
If the cron job fails, the next job will not replace the previous one due to the "concurrencyPolicy" value which is "Allow". It will keep spawning new jobs and so eventually the system will be filled with failed cron jobs. To avoid such problem, the "concurrencyPolicy" value should be either "Replace" or "Forbid".
What issue might arise from using the following CronJob and how to fix it? ``` apiVersion: batch/v1beta1 kind: CronJob metadata: name: "some-cron-job" spec: schedule: '*/1 * * * *' jobTemplate: spec: template: spec: restartPolicy: Never concurrencyPolicy: Forbid successfulJobsHistoryLimit: 1 failedJobsHistoryLimit: 1 ```
The following lines placed under the template: ``` concurrencyPolicy: Forbid successfulJobsHistoryLimit: 1 failedJobsHistoryLimit: 1 ``` As a result this configuration isn't part of the cron job spec hence the cron job has no limits which can cause issues like OOM and potentially lead to API server being down.
To fix it, these lines should placed in the spec of the cron job, above or under the "schedule" directive in the above example.
### Misc
Explain Imperative Management vs. Declarative Management
Explain what Kubernetes Service Discovery means
You have one Kubernetes cluster and multiple teams that would like to use it. You would like to limit the resources each team consumes in the cluster. Which Kubernetes concept would you use for that?
Namespaces will allow to limit resources and also make sure there are no collisions between teams when working in the cluster (like creating an app with the same name).
What Kube Proxy does?
Kube Proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept
What "Resources Quotas" are used for and how?
Explain ConfigMap
Separate configuration from pods. It's good for cases where you might need to change configuration at some point but you don't want to restart the application or rebuild the image so you create a ConfigMap and connect it to a pod but externally to the pod. Overall it's good for: * Sharing the same configuration between different pods * Storing external to the pod configuration
How to use ConfigMaps?
1. Create it (from key&value, a file or an env file) 2. Attach it. Mount a configmap as a volume
True or False? Sensitive data, like credentials, should be stored in a ConfigMap
False. Use secret.
Explain "Horizontal Pod Autoscaler"
In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource with the aim of automatically scaling the workload to match demand.
When you delete a pod, is it deleted instantly? (a moment after running the command)
What does being cloud-native mean?
The term cloud native refers to the concept of building and running applications to take advantage of the distributed computing offered by the cloud delivery model.
Explain the pet and cattle approach of infrastructure with respect to kubernetes
Describe how you one proceeds to run a containerized web app in K8s, which should be reachable from a public URL.
How would you troubleshoot your cluster if some applications are not reachable any more?
Describe what CustomResourceDefinitions there are in the Kubernetes world? What they can be used for?
How does scheduling work in kubernetes?
The control plane component kube-scheduler asks the following questions, 1. What to schedule? It tries to understand the pod-definition specifications 2. Which node to schedule? It tries to determine the best node with available resources to spin a pod 3. Binds the Pod to a given node View more [here](https://www.youtube.com/watch?v=rDCWxkvPlAw)
How are labels and selectors used?
What QoS classes are there?
* Guaranteed * Burstable * BestEffort
Explain Labels. What are they and why would one use them?
Kubernetes labels are key-value pairs that can connect identifying metadata with Kubernetes objects.
Explain Selectors
What is Kubeconfig?
### Gatekeeper
What is Gatekeeper?
[Gatekeeper docs](https://open-policy-agent.github.io/gatekeeper/website/docs): "Gatekeeper is a validating (mutating TBA) webhook that enforces CRD-based policies executed by Open Policy Agent"
Explain how Gatekeeper works
On every request sent to the Kubernetes cluster, Gatekeeper sends the policies and the resources to OPA (Open Policy Agent) to check if it violates any policy. If it does, Gatekeeper will return the policy error message back. If it isn't violates any policy, the request will reach the cluster.
### Policy Testing
What is Conftest?
Conftest allows you to write tests against structured files. You can think of it as tests library for Kubernetes resources.
It is mostly used in testing environments such as CI pipelines or local hooks.
What is Datree? How is it different from Conftest?
Same as Conftest, it is used for policy testing and enforcement. The difference is that it comes with built-in policies.
### Helm
What is Helm?
Package manager for Kubernetes. Basically the ability to package YAML files and distribute them to other users and apply them in the cluster(s). As a concept it's quite common and can be found in many platforms and services. Think for example on package managers in operating systems. If you use Fedora/RHEL that would be dnf. If you use Ubuntu then, apt. If you don't use Linux, then a different question should be asked and it's why? but that's another topic :)
Why do we need Helm? What would be the use case for using it?
Sometimes when you would like to deploy a certain application to your cluster, you need to create multiple YAML files/components like: Secret, Service, ConfigMap, etc. This can be tedious task. So it would make sense to ease the process by introducing something that will allow us to share these bundle of YAMLs every time we would like to add an application to our cluster. This something is called Helm. A common scenario is having multiple Kubernetes clusters (prod, dev, staging). Instead of individually applying different YAMLs in each cluster, it makes more sense to create one Chart and install it in every cluster. Another scenario is, you would like to share what you've created with the community. For people and companies to easily deploy your application in their cluster.
Explain "Helm Charts"
Helm Charts is a bundle of YAML files. A bundle that you can consume from repositories or create your own and publish it to the repositories.
It is said that Helm is also Templating Engine. What does it mean?
It is useful for scenarios where you have multiple applications and all are similar, so there are minor differences in their configuration files and most values are the same. With Helm you can define a common blueprint for all of them and the values that are not fixed and change can be placeholders. This is called a template file and it looks similar to the following ``` apiVersion: v1 kind: Pod metadata: name: {[ .Values.name ]} spec: containers: - name: {{ .Values.container.name }} image: {{ .Values.container.image }} port: {{ .Values.container.port }} ``` The values themselves will in separate file: ``` name: some-app container: name: some-app-container image: some-app-image port: 1991 ```
What are some use cases for using Helm template file?
* Deploy the same application across multiple different environments * CI/CD
Explain the Helm Chart Directory Structure
someChart/ -> the name of the chart Chart.yaml -> meta information on the chart values.yaml -> values for template files charts/ -> chart dependencies templates/ -> templates files :)
How Helm supports release management?
Helm allows you to upgrade, remove and rollback to previous versions of charts. In version 2 of Helm it was with what is known as "Tiller". In version 3, it was removed due to security concerns.
#### Commands
How do you search for charts?
`helm search hub [some_keyword]`
Is it possible to override values in values.yaml file when installing a chart?
Yes. You can pass another values file: `helm install --values=override-values.yaml [CHART_NAME]` Or directly on the command line: `helm install --set some_key=some_value`
How do you list deployed releases?
`helm ls` or `helm list`
How to execute a rollback?
`helm rollback RELEASE_NAME REVISION_ID`
How to view revision history for a certain release?
`helm history RELEASE_NAME`
How to upgrade a release?
`helm upgrade RELEASE_NAME CHART_NAME`
### Security
What security best practices do you follow in regards to the Kubernetes cluster?
* Secure inter-service communication (one way is to use Istio to provide mutual TLS) * Isolate different resources into separate namespaces based on some logical groups * Use supported container runtime (if you use Docker then drop it because it's deprecated. You might want to CRI-O as an engine and podman for CLI) * Test properly changes to the cluster (e.g. consider using Datree to prevent kubernetes misconfigurations) * Limit who can do what (by using for example OPA gatekeeper) in the cluster * Use NetworkPolicy to apply network security * Consider using tools (e.g. Falco) for monitoring threats
### Troubleshooting Scenarios
Running kubectl get pods you see Pods in "Pending" status. What would you do?
One possible path is to run `kubectl describe pod ` to get more details.
You might see one of the following: * Cluster is full. In this case, extend the cluster. * ResourcesQuota limits are met. In this case you might want to modify them * Check if PersistentVolumeClaim mount is pending If none of the above helped, run the command (`get pods`) with `-o wide` to see if the node is assigned to a node. If not, there might be an issue with scheduler.
Users unable to reach an application running on a Pod on Kubernetes. What might be the issue and how to check?
One possible path is to start with checking the Pod status. 1. Is the Pod pending? if yes, check for the reason with `kubectl describe pod ` TODO: finish this...
### Istio
What is Istio? What is it used for?
Istio is an open source service mesh that helps organizations run distributed, microservices-based apps anywhere. Istio enables organizations to secure, connect, and monitor microservices, so they can modernize their enterprise apps more swiftly and securely.
### Controllers
What are controllers?
[Kubernetes.io](https://kubernetes.io/docs/concepts/architecture/controller): "In Kubernetes, controllers are control loops that watch the state of your cluster, then make or request changes where needed. Each controller tries to move the current cluster state closer to the desired state."
Name two controllers you are familiar with
1. Node Controller: manages the nodes of a cluster. Among other things, the controller is responsible for monitoring nodes' health - if the node is suddenly unreachable it will evacuate all the pods running on it and will mark the node status accordingly. 2. Replication Controller - monitors the status of pod replicas based on what should be running. It makes sure the number of pods that should be running is actually running
What process is responsible for running and installing the different controllers?
Kube-Controller-Manager
What is the control loop? How it works?
Explained [here](https://www.youtube.com/watch?v=i9V4oCa5f9I)
What are all the phases/steps of a control loop?
- Observe - identify the cluster current state - Diff - Identify whether a diff exists between current state and desired state - Act - Bring current cluster state to the desired state (basically reach a state where there is no diff)
### Scheduler
True of False? The scheduler is responsible for both deciding where a Pod will run and actually running it
False. While the scheduler is responsible for choosing the node on which the Pod will run, Kubelet is the one that actually runs the Pod.
How to schedule a pod on a node called "node1"?
`k run some-pod --image=redix -o yaml --dry-run=client > pod.yaml` `vi pod.yaml` and add: ``` spec: nodeName: node1 ``` `k apply -f pod.yaml` Note: if you don't have a node1 in your cluster the Pod will be stuck on "Pending" state.
#### Node Affinity
Using node affinity, set a Pod to schedule on a node where the key is "region" and value is either "asia" or "emea"
`vi pod.yaml` ```yaml affinity: nodeAffinity: requiredDuringSchedlingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: region operator: In values: - asia - emea ```
Using node affinity, set a Pod to never schedule on a node where the key is "region" and value is "neverland"
`vi pod.yaml` ```yaml affinity: nodeAffinity: requiredDuringSchedlingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: region operator: NotIn values: - neverland ```
True of False? Using the node affinity type "requiredDuringSchedlingIgnoredDuringExecution" means the scheduler can't schedule unless the rule is met
True
True of False? Using the node affinity type "preferredDuringSchedlingIgnoredDuringExecution" means the scheduler can't schedule unless the rule is met
False. The scheduler tries to find a node that meets the requirements/rules and if it doesn't it will schedule the Pod anyway.
Can you deploy multiple schedulers?
Yes, it is possible. You can run another pod with a command similar to: ``` spec: containers: - command: - kube-scheduler - --address=127.0.0.1 - --leader-elect=true - --scheduler-name=some-custom-scheduler ... ```
Assuming you have multiple schedulers, how to know which scheduler was used for a given Pod?
Running `kubectl get events` you can see which scheduler was used.
You want to run a new Pod and you would like it to be scheduled by a custom scheduler. How to achieve it?
Add the following to the spec of the Pod: ``` spec: schedulerName: some-custom-scheduler ```
### Taints
Check if there are taints on node "master"
`k describe no master | grep -i taints`
Create a taint on one of the nodes in your cluster with key of "app" and value of "web" and effect of "NoSchedule". Verify it was applied
`k taint node minikube app=web:NoSchedule` `k describe no minikube | grep -i taints`
You applied a taint with k taint node minikube app=web:NoSchedule on the only node in your cluster and then executed kubectl run some-pod --image=redis. What will happen?
The Pod will remain in "Pending" status due to the only node in the cluster having a taint of "app=web".
You applied a taint with k taint node minikube app=web:NoSchedule on the only node in your cluster and then executed kubectl run some-pod --image=redis but the Pod is in pending state. How to fix it?
`kubectl edit po some-pod` and add the following ``` - effect: NoSchedule key: app operator: Equal value: web ``` Exit and save. The pod should be in Running state now.
Remove an existing taint from one of the nodes in your cluster
`k taint node minikube app=web:NoSchedule-`
What taint effects are there? Explain each one of them
`NoSchedule`: prevents from resources to be scheduled on a certain node `PreferNoSchedule`: will prefer to shcedule resources on other nodes before resorting to scheduling the resource on the chosen node (on which the taint was applied) `NoExecute`: Applying "NoSchedule" will not evict already running Pods (or other resources) from the node as opposed to "NoExecute" which will evict any already running resource from the Node
### Resource Limits
Explain why one would specify resource limits in regards to Pods
* You know how much RAM and/or CPU your app should be consuming and anything above that is not valid * You would like to make sure that everyone can run their apps in the cluster and resources are not being solely used by one type of application
True or False? Resource limits applied on a Pod level meaning, if limits is 2gb RAM and there are two container in a Pod that it's 1gb RAM each
False. It's per container and not per Pod.
#### Resources Limits - Commands
Check if there are any limits on one of the pods in your cluster
`kubectl describe po | grep -i limits`
Run a pod called "yay" with the image "python" and resources request of 64Mi memory and 250m CPU
`kubectl run yay --image=python --dry-run=client -o yaml > pod.yaml` `vi pod.yaml` ``` spec: containers: - image: python imagePullPolicy: Always name: yay resources: requests: cpu: 250m memory: 64Mi ``` `kubectl apply -f pod.yaml`
Run a pod called "yay2" with the image "python". Make sure it has resources request of 64Mi memory and 250m CPU and the limits are 128Mi memory and 500m CPU
`kubectl run yay2 --image=python --dry-run=client -o yaml > pod.yaml` `vi pod.yaml` ``` spec: containers: - image: python imagePullPolicy: Always name: yay2 resources: limits: cpu: 500m memory: 128Mi requests: cpu: 250m memory: 64Mi ``` `kubectl apply -f pod.yaml`
### Monitoring
What monitoring solutions are you familiar with in regards to Kubernetes?
There are many types of monitoring solutions for Kubernetes. Some open-source, some are in-memory, some of them cost money, ... here is a short list: * metrics-server: in-memory open source monitoring * datadog: $$$ * promethues: open source monitoring solution
Describe how the monitoring solution you are working with monitors Kubernetes and
This very much depends on what you chose to use. Let's address some of the solutions: * metrics-server: an open source and free monitoring solution that uses the cAdvisor component of kubelet to retrieve information on the cluster and its resources and stores them in-memory. Once installed, after some time you can run commands like `kubectl top node` and `kubectl top pod` to view performance metrics on nodes, pods and other resources. TODO: add more monitoring solutions
### Kustomize
What is Kustomize?
Explain the need for Kustomize by describing actual use cases
* You have an helm chart of an application used by multiple teams in your organization and there is a requirement to add annotation to the app specifying the name of the of team owning the app * Without Kustomize you would need to copy the files (chart template in this case) and modify it to include the specific annotations we need * With Kustomize you don't need to copy the entire repo or files * You are asked to apply a change/patch to some app without modifying the original files of the app * With Kustomize you can define kustomization.yml file that defines these customizations so you don't need to touch the original app files
Describe in high-level how Kustomize works
1. You add kustomization.yml file in the folder of the app you would like to customize. 1. You define the customizations you would like to perform 2. You run `kustomize build APP_PATH` where your kustomization.yml also resides
### Deployment Strategies
What rollout/deployment strategies are you familiar with?
* Blue/Green Deployments: You deploy a new version of your app, while old version still running, and you start redirecting traffic to the new version of the app * Canary Deployments: You deploy a new version of your app and start redirecting **portion** of your users/traffic to the new version. So you the migration to the new version is much more gradual
Explain Blue/Green deployments/rollouts in detail
Blue/Green deployment steps: 1. Traffic coming from users through a load balancer to the application which is currently version 1 Users -> Load Balancer -> App Version 1 2. A new application version 2 is deployed (while version 1 still running) Users -> Load Balancer -> App Version 1 App Version 2 3. If version 2 runs properly, traffic switched to it instead of version 1 User -> Load Balancer App version 1 -> App Version 2 4. Whether old version is removed or keep running but without users being redirected to it, is based on team or company decision Pros: * We can rollback/switch quickly to previous version at any point Cons: * In case of an issue with new version, ALL users are affected (instead of small portion/percentage)
Explain Canary deployments/rollouts in detail
Canary deployment steps: 1. Traffic coming from users through a load balancer to the application which is currently version 1 Users -> Load Balancer -> App Version 1 2. A new application version 2 is deployed (while version 1 still running) and part of the traffic is redirected to the new version Users -> Load Balancer ->(95% of the traffic) App Version 1 ->(5% of the traffic) App Version 2 3. If the new version (2) runs well, more traffic is redirected to it Users -> Load Balancer ->(70% of the traffic) App Version 1 ->(30% of the traffic) App Version 2 3. If everything runs well, at some point all traffic is redirected to the new version Users -> Load Balancer -> App Version 2 Pros: * If there is any issue with the new deployed app version, only some portion of the users affected, instead of all of them Cons: * Testing of new version is neccesrialy in the production environment (as the user traffic is exists only there)
What ways are you familiar with to implement deployment strategies (like canary, blue/green) in Kubernetes?
There are multiple ways. One of them is Argo Rollouts.
### Scenarios
An engineer form your organization told you he is interested only in seeing his team resources in Kubernetes. Instead, in reality, he sees resources of the whole organization, from multiple different teams. What Kubernetes concept can you use in order to deal with it?
Namespaces. See the following [namespaces question and answer](#namespaces-use-cases) for more information.
An engineer in your team runs a Pod but the status he sees is "CrashLoopBackOff". What does it means? How to identify the issue?
The container failed to run (due to different reasons) and Kubernetes tries to run the Pod again after some delay (= BackOff time). Some reasons for it to fail: - Misconfiguration - misspelling, non supported value, etc. - Resource not available - nodes are down, PV not mounted, etc. Some ways to debug: 1. `kubectl describe pod POD_NAME` 1. Focus on `State` (which should be Waiting, CrashLoopBackOff) and `Last State` which should tell what happened before (as in why it failed) 2. Run `kubectl logs mypod` 1. This should provide an accurate output of 2. For specific container, you can add `-c CONTAINER_NAME`
An engineer form your organization asked whether there is a way to prevent from Pods (with cretain label) to be scheduled on one of the nodes in the cluster. Your reply is:
Yes, using taints, we could run the following command and it will prevent from all resources with label "app=web" to be scheduled on node1: `kubectl taint node node1 app=web:NoSchedule`
You would like to limit the number of resources being used in your cluster. For example no more than 4 replicasets, 2 services, etc. How would you achieve that?
Using ResourceQuats
================================================ FILE: topics/kubernetes/exercises/kustomize_common_labels/exercise.md ================================================ # Kustomize - Common Labels ## Requirements 1. Running Kubernetes cluster 2. Kubctl version 1.14 or above ## Objectives In the current directory there is an app composed of a Deployment and Service. 1. Write a kustomization.yml file that will add to both the Service and Deployment the label "team-name: aces" 2. Execute a kustomize command that will generate the customized k8s files with the label appended ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/kubernetes/exercises/kustomize_common_labels/solution.md ================================================ # Kustomize - Common Labels ## Requirements 1. Running Kubernetes cluster 2. Kubctl version 1.14 or above ## Objectives In the current directory there is an app composed of a Deployment and Service. 1. Write a kustomization.yml file that will add to both the Service and Deployment the label "team-name: aces" 2. Execute a kustomize command that will generate the customized k8s files with the label appended ## Solution 1. Add the following to kustomization.yml in someApp directory: ``` apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization commonLabels: team-name: aces resources: - service.yml - deployment.yml ``` 2. Run `kubectl apply -k someApp` ================================================ FILE: topics/kubernetes/exercises/kustomize_common_labels/someApp/deployment.yml ================================================ apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 ================================================ FILE: topics/kubernetes/exercises/kustomize_common_labels/someApp/service.yml ================================================ apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 9376 ================================================ FILE: topics/kubernetes/exercises/labels_and_selectors/exercise.md ================================================ # Labels and Selectors 101 ## Objectives 1. How to list all the Pods with the label "app=web"? 2. How to list all objects labeled as "env=staging"? 3. How to list all deployments from "env=prod" and "type=web"? ## Solution Click [here](solution.md) to view the solution. ================================================ FILE: topics/kubernetes/exercises/labels_and_selectors/solution.md ================================================ # Labels and Selectors 101 ## Objectives 1. How to list all the Pods with the label "app=web"? 2. How to list all objects labeled as "env=staging"? 3. How to list all deployments from "env=prod" and "type=web"? ## Solution `k get po -l app=web` `k get all -l env=staging` `k get deploy -l env=prod,type=web` ================================================ FILE: topics/kubernetes/exercises/node_selectors/exercise.md ================================================ # Node Selectors ## Objectives 1. Apply the label "hw=max" on one of the nodes in your cluster 2. Create and run a Pod called `some-pod` with the image `redis` and configure it to use the selector `hw=max` 3. Explain why node selectors might be limited ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/kubernetes/exercises/node_selectors/solution.md ================================================ # Node Selectors ## Objectives 1. Apply the label "hw=max" on one of the nodes in your cluster 2. Create and run a Pod called `some-pod` with the image `redis` and configure it to use the selector `hw=max` 3. Explain why node selectors might be limited ## Solution Click [here](solution.md) to view the solution 1. `kubectl label nodes some-node hw=max` 2. ``` kubectl run some-pod --image=redis --dry-run=client -o yaml > pod.yaml vi pod.yaml spec: nodeSelector: hw: max kubectl apply -f pod.yaml ``` 3. Assume you would like to run your Pod on all the nodes with with either `hw` set to max or to min, instead of just max. This is not possible with nodeSelectors which are quite simplified and this is where you might want to consider `node affinity`. ================================================ FILE: topics/kubernetes/exercises/taints_101/exercise.md ================================================ # Taints 101 ## Objectives 1. Check if one of the nodes in the cluster has taints (doesn't matter which node) 2. Create a taint on one of the nodes in your cluster with key of "app" and value of "web" and effect of "NoSchedule" 1. Explain what it does exactly 2. Verify it was applied 3. Run a Pod that will be able to run on the node on which you applied the taint ## Solution Click [here](solution.md) to view the solution. ================================================ FILE: topics/kubernetes/exercises/taints_101/solution.md ================================================ # Taints 101 ## Objectives 1. Check if one of the nodes in the cluster has taints (doesn't matter which node) 2. Create a taint on one of the nodes in your cluster with key of "app" and value of "web" and effect of "NoSchedule" 1. Explain what it does exactly 2. Verify it was applied ## Solution 1. `kubectl describe no minikube | grep -i taints` 2. `kubectl taint node minikube app=web:NoSchedule` 1. Any resource with "app=web" key value will not be scheduled on node `minikube` 2. `kubectl describe no minikube | grep -i taints` 3. ``` kubectl run some-pod --image=redis kubectl edit po some-pod ``` ``` - effect: NoSchedule key: app operator: Equal value: web ``` Save and exit. The Pod should be running. ================================================ FILE: topics/kubernetes/killing_containers.md ================================================ ## "Killing" Containers 1. Run Pod with a web service (e.g. httpd) 2. Verify the web service is running with the `ps` command 3. Check how many restarts the pod has performed 4. Kill the web service process 5. Check how many restarts the pod has performed 6. Verify again the web service is running ## After you complete the exercise * Why did the "RESTARTS" count raised? ================================================ FILE: topics/kubernetes/pods_01.md ================================================ ## Pods 01 #### Objective Learn how to create pods #### Instructions 1. Choose a container image (e.g. redis, nginx, mongo, etc.) 2. Create a pod (in the default namespace) using the image you chose 3. Verify the pod is running ================================================ FILE: topics/kubernetes/replicaset_01.md ================================================ ## ReplicaSet 101 #### Objective Learn how to create and view ReplicaSets #### Instructions 1. Create a ReplicaSet with 2 replicas. The app can be anything. 2. Verify a ReplicaSet was created and there are 2 replicas 3. Delete one of the Pods the ReplicaSet has created 4. If you'll list all the Pods now, what will you see? 5. Remove the ReplicaSet you've created 6. Verify you've deleted the ReplicaSet ================================================ FILE: topics/kubernetes/replicaset_02.md ================================================ ## ReplicaSet 102 #### Objective Learn how to operate ReplicaSets #### Instructions 1. Create a ReplicaSet with 2 replicas. The app can be anything. 2. Verify a ReplicaSet was created and there are 2 replicas 3. Remove the ReplicaSet but NOT the pods it created 4. Verify you've deleted the ReplicaSet but the Pods are still running ================================================ FILE: topics/kubernetes/replicaset_03.md ================================================ ## ReplicaSet 103 #### Objective Learn how labels used by ReplicaSets #### Instructions 1. Create a ReplicaSet with 2 replicas. Make sure the label used for the selector and in the Pods is "type=web" 2. Verify a ReplicaSet was created and there are 2 replicas 3. List the Pods running 4. Remove the label (type=web) from one of the Pods created by the ReplicaSet 5. List the Pods running. Are there more Pods running after removing the label? Why? 6. Verify the ReplicaSet indeed created a new Pod ================================================ FILE: topics/kubernetes/services_01.md ================================================ ## Services 01 #### Objective Learn how to create services #### Instructions 1. Create a pod running ngnix 2. Create a service for the pod you've just created 3. Verify the app is reachable ================================================ FILE: topics/kubernetes/solutions/killing_containers.md ================================================ ## "Killing" Containers - Solution 1. Run Pod with a web service (e.g. httpd) - `kubectl run web --image registry.redhat.io/rhscl/httpd-24-rhel7` 2. Verify the web service is running with the `ps` command - `kubectl exec web -- ps` 3. Check how many restarts the pod has performed - `kubectl get po web` 4. Kill the web service process -`kubectl exec web -- kill 1` 5. Check how many restarts the pod has performed - `kubectl get po web` 6. Verify again the web service is running - `kubectl exec web -- ps` ## After you complete the exercise * Why did the "RESTARTS" count raised? - `Kubernetes restarted the Pod because we killed the process and the container was not running properly.` ================================================ FILE: topics/kubernetes/solutions/pods_01_solution.md ================================================ ## Pods 01 - Solution ``` kubectl run nginx --image=nginx --restart=Never kubectl get pods ``` ================================================ FILE: topics/kubernetes/solutions/replicaset_01_solution.md ================================================ ## ReplicaSet 01 - Solution 1. Create a ReplicaSet with 2 replicas. The app can be anything. ``` cat >> rs.yaml < ``` 4. If you'll list all the Pods now, what will you see? ``` The same number of Pods. Since we defined 2 replicas, the ReplicaSet will make sure to create another Pod that will replace the one you've deleted. ``` 5. Remove the ReplicaSet you've created ``` kubectl delete -f rs.yaml ``` 6. Verify you've deleted the ReplicaSet ``` kubectl get rs # OR a more specific way: kubectl get -f rs.yaml ``` ================================================ FILE: topics/kubernetes/solutions/replicaset_02_solution.md ================================================ ## ReplicaSet 02 - Solution 1. Create a ReplicaSet with 2 replicas. The app can be anything. ``` cat >> rs.yaml <> rs.yaml < running_pods.txt ``` 4. Remove the label (type=web) from one of the Pods created by the ReplicaSet ``` kubectl label pod type- ``` 5. List the Pods running. Are there more Pods running after removing the label? Why? ``` Yes, there is an additional Pod running because once the label (used as a matching selector) was removed, the Pod became independent meaning, it's not controlled by the ReplicaSet anymore and the ReplicaSet was missing replicas based on its definition so, it created a new Pod. ``` 6. Verify the ReplicaSet indeed created a new Pod ``` kubectl describe rs web ``` ================================================ FILE: topics/kubernetes/solutions/services_01_solution.md ================================================ ## Services 01 - Solution ``` kubectl run nginx --image=nginx --restart=Never --port=80 --labels="app=dev-nginx" cat << EOF > nginx-service.yaml apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: dev-nginx ports: - protocol: TCP port: 80 targetPort: 9372 EOF ``` ================================================ FILE: topics/linux/README.md ================================================ # Linux ## Linux Master Application A completely free application for testing your knowledge on Linux. Disclaimer: developed by repository owner - [Linux](#linux) - [Linux Master Application](#linux-master-application) - [Linux Exercises](#linux-exercises) - [Basics](#basics) - [Misc](#misc) - [Linux Questions](#linux-questions) - [Linux 101](#linux-101) - [I/O Redirection](#io-redirection) - [Filesystem Hierarchy Standard](#filesystem-hierarchy-standard) - [Permissions](#permissions) - [Scenarios](#scenarios) - [Systemd](#systemd) - [Troubleshooting and Debugging](#troubleshooting-and-debugging) - [Scenarios](#scenarios-1) - [Kernel](#kernel) - [SSH](#ssh) - [Globbing & Wildcards](#globbing--wildcards) - [Boot Process](#boot-process) - [Disk and Filesystem](#disk-and-filesystem) - [Performance Analysis](#performance-analysis) - [Processes](#processes) - [Security](#security) - [Networking](#networking) - [DNS](#dns) - [Packaging](#packaging) - [DNF](#dnf) - [Applications and Services](#applications-and-services) - [Users and Groups](#users-and-groups) - [Hardware](#hardware) - [Namespaces](#namespaces) - [Virtualization](#virtualization) - [AWK](#awk) - [System Calls](#system-calls) - [Filesystem & Files](#filesystem--files) - [Advanced Networking](#advanced-networking) - [Memory](#memory) - [Distributions](#distributions) - [Sed](#sed) - [Misc](#misc-1) ## Linux Exercises ### Basics |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Navigation | cd, pwd | [Exercise](exercises/navigation/README.md) | [Solution](exercises/navigation/solution.md) | Create and Destroy | touch, rm, mkdir | [Exercise](exercises/create_remove/README.md) | [Solution](exercises/create_remove/solution.md) | Copy Time | touch, cp, ls | [Exercise](exercises/copy/README.md) | [Solution](exercises/copy/solution.md) ### Misc |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Unique Count | | [Exercise](exercises/uniqe_count/README.md) | [Solution](exercises/uniqe_count/solution.md) ## Linux Questions ### Linux 101
What is Linux?
[Wikipedia](https://en.wikipedia.org/wiki/Linux): "Linux is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged in a Linux distribution." [Red Hat](https://www.redhat.com/en/topics/linux/what-is-linux): "Linux® is an open source operating system (OS). An operating system is the software that directly manages a system’s hardware and resources, like CPU, memory, and storage. The OS sits between applications and hardware and makes the connections between all of your software and the physical resources that do the work."
Explain what each of the following commands does and give an example on how to use it: * touch * ls * rm * cat * cp * mkdir * pwd * cd
* touch - update file's timestamp. More commonly used for creating files * ls - listing files and directories * rm - remove files and directories * cat - create, view and concatenate files * cp - copy files and directories * mkdir - create directories * pwd - print current working directory (= at what path the user currently located) * cd - change directory
What each of the following commands does? * cd / * cd ~ * cd * cd .. * cd . * cd -
* cd / -> change to the root directory * cd ~ -> change to your home directory * cd -> change to your home directory * cd .. -> change to the directory above your current i.e parent directory * cd . -> change to the directory you currently in * cd - -> change to the last visited path
Some of the commands in the previous question can be run with the -r/-R flag. What does it do? Give an example to when you would use it
The -r (or -R in some commands) flag allows the user to run a certain command recursively. For example, listing all the files under the following tree is possible when done recursively (`ls -R`): /dir1/ dir2/ file1 file2 dir3/ file3 To list all the files, one can run `ls -R /dir1`
Explain each field in the output of `ls -l` command
It shows a detailed list of files in a long format. From the left: * file permissions, number of links, owner name, owner group, file size, timestamp of last modification and directory/file name
What are hidden files/directories? How to list them?
These are files directly not displayed after performing a standard ls direct listing. An example of these files are .bashrc which are used to execute some scripts. Some also store configuration about services on your host like .KUBECONFIG. The command used to list them is, `ls -a`
What do > and < do in terms of input and output for programs?
They take in input (<) and output for a given file (>) using stdin and stdout. `myProgram < input.txt > executionOutput.txt`
Explain what each of the following commands does and give an example on how to use it: * sed * grep * cut * awk
- sed: a stream editor. Can be used for various purposes like replacing a word in a file: `sed -i s/salad/burger/g` - grep: a search tool. Used to search, count or match a text in a file: - searching for any line that contains a word in a file: `grep 'word' file.md` - or displaying the total number of times a string appears in a file: `grep -c 'This is a string' file.md` - cut: a tool for cutting out selected portions of each line of a file: - syntax: `cut OPTION [FILE]` - cutting first two bytes from a word in a file: `cut -b 1-2 file.md`, output: `wo` - awk: a programming language that is mainly used for text processing and data extraction. It can be used to manipulate and modify text in a file: - syntax: awk [OPTIONS] [FILTER] [FILE] extracting a specific field from a CSV file: awk -F ',' '{print $1}' file.csv, output: first field of each line in the file
How to rename the name of a file or a directory?
Using the `mv` command.
Specify which command would you use (and how) for each of the following scenarios * Remove a directory with files * Display the content of a file * Provides access to the file /tmp/x for everyone * Change working directory to user home directory * Replace every occurrence of the word "good" with "great" in the file /tmp/y
- `rm -rf dir` - `cat or less` - `chmod 777 /tmp/x` - `cd ~` - `sed -i s/good/great/g /tmp/y`
How can you check what is the path of a certain command?
* whereis * which
What is the difference between these two commands? Will it result in the same output? ``` echo hello world echo "hello world" ```
The echo command receives two separate arguments in the first execution and in the second execution it gets one argument which is the string "hello world". The output will be the same.
Explain piping. How do you perform piping?
Using a pipe in Linux, allows you to send the output of one command to the input of another command. For example: `cat /etc/services | wc -l`
Fix the following commands: * sed "s/1/2/g' /tmp/myFile * find . -iname \*.yaml -exec sed -i "s/1/2/g" {} ;
``` sed 's/1/2/g' /tmp/myFile # sed "s/1/2/g" is also fine find . -iname "*.yaml" -exec sed -i "s/1/2/g" {} \; ```
How to check which commands you executed in the past?
history command or .bash_history file * also can use up arrow key to access or to show the recent commands you type
Running the command df you get "command not found". What could be wrong and how to fix it?

Most likely the default/generated $PATH was somehow modified or overridden thus not containing /bin/ where df would normally go. This issue could also happen if bash_profile or any configuration file of your interpreter was wrongly modified, causing erratics behaviours. You would solve this by fixing your $PATH variable: As to fix it there are several options: 1. Manually adding what you need to your $PATH PATH="$PATH":/user/bin:/..etc 2. You have your weird env variables backed up. 3. You would look for your distro default $PATH variable, copy paste using method #1 Note: There are many ways of getting errors like this: if bash_profile or any configuration file of your interpreter was wrongly modified; causing erratics behaviours, permissions issues, bad compiled software (if you compiled it by yourself)... there is no answer that will be true 100% of the time.

How do you schedule tasks periodically?
You can use the commands cron and at. With cron, tasks are scheduled using the following format: */30 * * * * bash myscript.sh Executes the script every 30 minutes. The tasks are stored in a cron file, you can write in it using crontab -e Alternatively if you are using a distro with systemd it's recommended to use systemd timers.
### I/O Redirection
Explain Linux I/O redirection
In Linux, IO redirection is a way of changing the default input/output behavior of a command or program. It allows you to redirect input and output from/to different sources/destinations, such as files, devices, and other commands. Here are some common examples of IO redirection: * Redirecting Standard Output (stdout): ls > filelist.txt * Redirecting Standard Error (stderr): ls /some/nonexistent/directory 2> error.txt * Appending to a file: echo "hello" >> myfile.txt * Redirecting Input (stdin): sort < unsorted.txt * Using Pipes: Pipes ("|"): ls | grep "\.txt$"
Demonstrate Linux output redirection
ls > ls_output.txt
Demonstrate Linux stderr output redirection
yippiekaiyay 2> ls_output.txt
Demonstrate Linux stderr to stdout redirection
yippiekaiyay &> file
What is the result of running the following command? yippiekaiyay 1>&2 die_hard
An output similar to: `yippikaiyay: command not found...`
The file `die_hard` will not be created
### Filesystem Hierarchy Standard
In Linux FHS (Filesystem Hierarchy Standard) what is the /?
The root of the filesystem. The beginning of the tree.
What is stored in each of the following paths? - /bin, /sbin, /usr/bin and /usr/sbin - /etc - /home - /var - /tmp
* binaries * configuration files * home directories of the different users * files that tend to change and be modified like logs * temporary files
What is special about the /tmp directory when compared to other directories?
`/tmp` folder is cleaned automatically, usually upon reboot.
What kind of information one can find in /proc?
It contains useful information about the processes that are currently running, it is regarded as control and information center for kernel.
What makes /proc different from other filesystems?
/proc is a special virtual filesystem in Unix-like operating systems, including Linux, that provides information about processes and system resources.
True or False? only root can create files in /proc
False. No one can create file in /proc directly (certain operations can lead to files being created in /proc by the kernel).
What can be found in /proc/cmdline?
The command passed to the boot loader to run the kernel
In which path can you find the system devices (e.g. block storage)?
/dev
### Permissions
How to change the permissions of a file?
Using the `chmod` command.
What does the following permissions mean?: * 777 * 644 * 750
777 - You give the owner, group and other: Execute (1), Write (2) and Read (4); 4+2+1 = 7.
644 - Owner has Read (4), Write (2), 4+2 = 6; Group and Other have Read (4).
750 - Owner has x+r+w, Group has Read (4) and Execute (1); 4+1 = 5. Other have no permissions.
What this command does? chmod +x some_file
It adds execute permissions to all sets i.e user, group and others
Explain what is setgid and setuid
* setuid is a linux file permission that permits a user to run a file or program with the permissions of the owner of that file. This is possible by elevation of current user privileges. * setgid is a process when executed will run as the group that owns the file.
What is the purpose of sticky bit?
Its a bit that only allows the owner or the root user to delete or modify the file.
What the following commands do? - chmod - chown - chgrp
* chmod - changes access permissions to files system objects * chown - changes the owner of file system files and directories * chgrp - changes the group associated with a file system object
What is sudo? How do you set it up?
sudo is a command-line utility in Unix-like operating systems that allows users to run programs with the privileges of another user, usually the superuser (root). It stands for "superuser do. The sudo program is installed by default in almost all Linux distributions. If you need to install sudo in Debian/Ubuntu, use the command apt-get install sudo
True or False? In order to install packages on the system one must be the root user or use the sudo command
True
Explain what are ACLs. For what use cases would you recommend to use them?
ACL stands for Access Control Lists. We can use ACL to have more granular control over accesses to certain files for certain users specifically. For instance, we can return the ACL of a particular file with the command getfacl /absolute/file/path and modify ACLs for a specific file with setfacl -m.
You try to create a file but it fails. Name at least three different reason as to why it could happen
* No more disk space * No more inodes * No permissions
A user accidentally executed the following chmod -x $(which chmod). How to fix it?
Using `sudo setfacl -m u::rx /usr/bin/chmod` will set the execute permissions on `chmod` for all the users. Post this, the `chmod` binary can be used as usual.
### Scenarios
You would like to copy a file to a remote Linux host. How would you do?
There are multiple ways to transfer files between hosts. Personal opinion: use `rsync`
How to generate a random string?
One way is to run the following: `cat /proc/sys/kernel/random/uuid`
How to generate a random string of 7 characters?
`mkpasswd -l 7`
### Systemd
What is systemd?
Systemd is a daemon (System 'd', d stands for daemon). A daemon is a program that runs in the background without direct control of the user, although the user can at any time talk to the daemon. systemd has many features such as user processes control/tracking, snapshot support, inhibitor locks.. If we visualize the unix/linux system in layers, systemd would fall directly after the linux kernel.
Hardware -> Kernel -> Daemons, System Libraries, Server Display.
How to start or stop a service?
To start a service: `systemctl start ` To stop a service: `systemctl stop `
How to check the status of a service?
`systemctl status `
On a system which uses systemd, how would you display the logs?
journalctl
Describe how to make a certain process/app a service
The process will need a .service file to be created at the location /etc/systemd/system/service-name.service to be made into a service. The file has certain characteristics and need certain inputs to work. More details here.
### Troubleshooting and Debugging
Where system logs are located?
/var/log
How to follow file's content as it being appended without opening the file every time?
tail -f
What are you using for troubleshooting and debugging network issues?
dstat -t is great for identifying network and disk issues. netstat -tnlaup can be used to see which processes are running on which ports. lsof -i -P can be used for the same purpose as netstat. ngrep -d any metafilter for matching regex against payloads of packets. tcpdump for capturing packets wireshark same concept as tcpdump but with GUI (optional).
What are you using for troubleshooting and debugging disk & file system issues?
dstat -t is great for identifying network and disk issues. opensnoop can be used to see which files are being opened on the system (in real time).
What are you using for troubleshooting and debugging process issues?
strace is great for understanding what your program does. It prints every system call your program executed.
What are you using for debugging CPU related issues?
top will show you how much CPU percentage each process consumes perf is a great choice for sampling profiler and in general, figuring out what your CPU cycles are "wasted" on flamegraphs is great for CPU consumption visualization (http://www.brendangregg.com/flamegraphs.html)
You get a call from someone claiming "my system is SLOW". What do you do?
* Check with `top` for anything unusual * Run `dstat -t` to check if it's related to disk or network. * Check if it's network related with `sar` * Check I/O stats with `iostat`
Explain iostat output
How to debug binaries?
What is the difference between CPU load and utilization?
How you measure time execution of a program?
#### Scenarios
You have a process writing to a file. You don't know which process exactly, you just know the path of the file. You would like to kill the process as it's no longer needed. How would you achieve it?
1. Run `lsof ` 2. Use the pid (process ID) from the lsof command and run `kill `
### Kernel
What is a kernel, and what does it do?
The kernel is part of the operating system and is responsible for tasks like: * Allocating memory * Schedule processes * Control CPU
How do you find out which Kernel version your system is using?
`uname -a` command
What is a Linux kernel module and how do you load a new module?
A Linux kernel module is a piece of code that can be dynamically loaded into the kernel to extend its functionality. These modules are typically used to add support for hardware devices, filesystems, or system calls. The kernel itself is monolithic, but with modules, its capabilities can be extended without having to reboot the system or recompile the entire kernel.
Explain user space vs. kernel space
The operating system executes the kernel in protected memory to prevent anyone from changing (and risking it crashing). This is what is known as "Kernel space". "User space" is where users executes their commands or applications. It's important to create this separation since we can't rely on user applications to not tamper with the kernel, causing it to crash. Applications can access system resources and indirectly the kernel space by making what is called "system calls".
In what phases of kernel lifecycle, can you change its configuration?
* Build time (when it's compiled) * Boot time (when it starts) * Runtime (once it's already running)
Where can you find kernel's configuration?
Usually it will reside in `/boot/config-..`
Where can you find the file that contains the command passed to the boot loader to run the kernel?
`/proc/cmdline`
How to list kernel's runtime parameters?
`sysctl -a`
Will running sysctl -a as a regular user vs. root, produce different result?
Yes, you might notice that in most systems, when running `systctl -a` with root, you'll get more runtime parameters compared to executing the same command with a regular user.
You would like to enable IPv4 forwarding in the kernel, how would you do it?
`sudo sysctl net.ipv4.ip_forward=1` To make it persistent (applied after reboot for example): insert `net.ipv4.ip_forward = 1` into `/etc/sysctl.conf` Another way to is to run `echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward`
How sysctl applies the changes to kernel's runtime parameters the moment you run sysctl command?
If you `strace` the sysctl command you can see it does it by changing the file under /proc/sys/... In the past it was done with sysctl system call, but it was deprecated at some point.
How changes to kernel runtime parameters persist? (applied even after reboot to the system for example)
There is a service called `systemd-sysctl` that takes the content of /etc/sysctl.conf and applies it. This is how changes persist, even after reboot, when they are written in /etc/sysctl.conf
Are the changes you make to kernel parameters in a container, affects also the kernel parameters of the host on which the container runs?
No. Containers have their own /proc filesystem so any change to kernel parameters inside a container, are not affecting the host or other containers running on that host.
### SSH
What is SSH? How to check if a Linux server is running SSH?
[Wikipedia Definition](https://en.wikipedia.org/wiki/SSH_(Secure_Shell)): "SSH or Secure Shell is a cryptographic network protocol for operating network services securely over an unsecured network." [Hostinger.com Definition](https://www.hostinger.com/tutorials/ssh-tutorial-how-does-ssh-work): "SSH, or Secure Shell, is a remote administration protocol that allows users to control and modify their remote servers over the Internet." An SSH server will have SSH daemon running. Depends on the distribution, you should be able to check whether the service is running (e.g. systemctl status sshd).
Why SSH is considered better than telnet?
Telnet also allows you to connect to a remote host but as opposed to SSH where the communication is encrypted, in telnet, the data is sent in clear text, so it doesn't considered to be secured because anyone on the network can see what exactly is sent, including passwords.
What is stored in ~/.ssh/known_hosts?
The file stores the key fingerprints for the clients connecting to the SSH server. This fingerprint creates a trust between the client and the server for future SSH connections.
You try to ssh to a server and you get "Host key verification failed". What does it mean?
It means that the key of the remote host was changed and doesn't match the one that stored on the machine (in ~/.ssh/known_hosts).
What is the difference between SSH and SSL?
What ssh-keygen is used for?
ssh-keygen is a tool to generate an authentication key pair for SSH, that consists of a private and a public key. It supports a number of algorithms to generate authentication keys : - dsa - ecdsa - ecdsa-sk - ed25519 - ed25519-sk - rsa (default) One can also specify number of bits in key. Command below generates an SSH key pair with RSA 4096-bits : ``` $ ssh-keygen -t rsa -b 4096 ``` The output looks like this: ``` Generating public/private rsa key pair. Enter file in which to save the key (/home/user/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/user/.ssh/id_rsa Your public key has been saved in /home/user/.ssh/id_rsa.pub The key fingerprint is: SHA256:f5MOGnhzYfC0ZCHvbSXXiRiNVYETjxpHcXD5xSojx+M user@mac-book-pro The key's randomart image is: +---[RSA 4096]----+ | . ..+***o| | o o++*o+| | . =+.++++| | B.oX+. .| | S *=o+ | | . o oE. | | . + + + | | . = + . | | . . | +----[SHA256]-----+ ``` One can check how many bits an SSH key has with : ``` $ ssh-keygen -l -f /home/user/.ssh/id_rsa ``` Output should look like this : ``` 4096 SHA256:f5MOGnhzYfC0ZCHvbSXXiRiNVYETjxpHcXD5xSojx+M user@mac-book-pro (RSA) ``` It shows the key is RSA 4096-bits. `-l` and `-f` parameters usage explanation : ``` -l Show the fingerprint of the key file. -f filename Filename of the key file. ``` Learn more : [How can I tell how many bits my ssh key is? - Superuser](https://superuser.com/a/139311)
What is SSH port forwarding?
### Globbing & Wildcards
What is Globbing?
What are wildcards? Can you give an example of how to use them?
Explain what will ls [XYZ] match
Explain what will ls [^XYZ] match
Explain what will ls [0-5] match
What each of the following matches - ? - *
* The ? matches any single character * The * matches zero or more characters
What do we grep for in each of the following commands?: * grep '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' some_file * grep -E "error|failure" some_file * grep '[0-9]$' some_file
1. An IP address 2. The word "error" or "failure" 3. Lines which end with a number
Which line numbers will be printed when running `grep '\baaa\b'` on the following content: aaa bbb ccc.aaa aaaaaa
lines 1 and 3.
What is the difference single and double quotes?
What is escaping? What escape character is used for escaping?
What is an exit code? What exit codes are you familiar with?
An exit code (or return code) represents the code returned by a child process to its parent process. 0 is an exit code which represents success while anything higher than 1 represents error. Each number has different meaning, based on how the application was developed. I consider this as a good blog post to read more about it: https://shapeshed.com/unix-exit-codes
### Boot Process
Tell me everything you know about the Linux boot process
Another way to ask this: what happens from the moment you turned on the server until you get a prompt
What is GRUB2?
What is Secure Boot?
What can you find in /boot?
### Disk and Filesystem
What's an inode?
For each file (and directory) in Linux there is an inode, a data structure which stores meta data related to the file like its size, owner, permissions, etc.
Which of the following is not included in inode: * Link count * File size * File name * File timestamp
File name (it's part of the directory file)
How to check which disks are currently mounted?
Run `mount`
You run the mount command but you get no output. How would you check what mounts you have on your system?
`cat /proc/mounts`
What is the difference between a soft link and hard link?
Hard link is the same file, using the same inode. Soft link is a shortcut to another file, using a different inode.
True or False? You can create an hard link for a directory
False
True or False? You can create a soft link between different filesystems
True
True or False? Directories always have by minimum 2 links
True.
What happens when you delete the original file in case of soft link and hard link?
Can you check what type of filesystem is used in /home?
There are many answers for this question. One way is running `df -T`
What is a swap partition? What is it used for?
How to create a - new empty file - a file with text (without using text editor) - a file with given size
* touch new_file.txt * cat > new_file [enter] submit text; ctrl + d to exit insert mode * truncate -s new_file.txt
You are trying to create a new file but you get "File system is full". You check with df for free space and you see you used only 20% of the space. What could be the problem?
How would you check what is the size of a certain directory?
`du -sh`
What is LVM?
Explain the following in regards to LVM: * PV * VG * LV
What is NFS? What is it used for?
What RAID is used for? Can you explain the differences between RAID 0, 1, 5 and 10?
Describe the process of extending a filesystem disk space
What is lazy umount?
What is tmpfs?
What is stored in each of the following logs? * /var/log/messages * /var/log/boot.log
True or False? both /tmp and /var/tmp cleared upon system boot
False. /tmp is cleared upon system boot while /var/tmp is cleared every a couple of days or not cleared at all (depends on distro).
### Performance Analysis
How to check what is the current load average?
One can use `uptime` or `top`
You know how to see the load average, great. but what each part of it means? for example 1.43, 2.34, 2.78
[This article](http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html) summarizes the load average topic in a great way
How to check process usage?
pidstat
How to check disk I/O?
`iostat -xz 1`
How to check how much free memory a system has? How to check memory consumption by each process?
You can use the commands top and free
How to check TCP stats?
sar -n TCP,ETCP 1
### Processes
how to list all the processes running in your system?
The "ps" command can be used to list all the processes running in a system. The "ps aux" command provides a detailed list of all the processes, including the ones running in the background.
How to run a process in the background and why to do that in the first place?
You can achieve that by specifying & at the end of the command. As to why, since some commands/processes can take a lot of time to finish execution or run forever, you may want to run them in the background instead of waiting for them to finish before gaining control again in current session.
How can you find how much memory a specific process consumes?
mem() { ps -eo rss,pid,euser,args:100 --sort %mem | grep -v grep | grep -i $@ | awk '{printf $1/1024 "MB"; $1=""; print }' } [Source](https://stackoverflow.com/questions/3853655/in-linux-how-to-tell-how-much-memory-processes-are-using)
What signal is used by default when you run 'kill *process id*'?
The default signal is SIGTERM (15). This signal kills
process gracefully which means it allows it to save current
state configuration.
What signals are you familiar with?
SIGTERM - default signal for terminating a process SIGHUP - common usage is for reloading configuration SIGKILL - a signal which cannot caught or ignored To view all available signals run `kill -l`
What kill 0 does?
"kill 0" sends a signal to all processes in the current process group. It is used to check if the processes exist or not
What kill -0 does?
"kill -0" checks if a process with a given process ID exists or not. It does not actually send any signal to the process.
What is a trap?
A trap is a mechanism that allows the shell to intercept signals sent to a process and perform a specific action, such as handling errors or cleaning up resources before terminating the process.
Every couple of days, a certain process stops running. How can you look into why it's happening?
One way to investigate why a process stops running is to check the system logs, such as the messages in /var/log/messages or journalctl. Additionally, checking the process's resource usage and system load may provide clues as to what caused the process to stop
What happens when you press ctrl + c?
When you press "Ctrl+C," it sends the SIGINT signal to the foreground process, asking it to terminate gracefully.
What is a Daemon in Linux?
A background process. Most of these processes are waiting for requests or set of conditions to be met before actually running anything. Some examples: sshd, crond, rpcbind.
What are the possible states of a process in Linux?
Running (R)
Uninterruptible Sleep (D) - The process is waiting for I/O
Interruptible Sleep (S)
Stopped (T)
Dead (x)
Zombie (z)
How do you kill a process in D state?
A process in D state (also known as "uninterruptible sleep") cannot be killed using the "kill" command. The only way to terminate it is to reboot the system.
What is a zombie process?
A process which has finished to run but has not exited. One reason it happens is when a parent process is programmed incorrectly. Every parent process should execute wait() to get the exit code from the child process which finished to run. But when the parent isn't checking for the child exit code, the child process can still exists although it finished to run.
How to get rid of zombie processes?
You can't kill a zombie process the regular way with `kill -9` for example as it's already dead. One way to kill zombie process is by sending SIGCHLD to the parent process telling it to terminate its child processes. This might not work if the parent process wasn't programmed properly. The invocation is `kill -s SIGCHLD [parent_pid]` You can also try closing/terminating the parent process. This will make the zombie process a child of init (1) which does periodic cleanups and will at some point clean up the zombie process.
How to find all the * Processes executed/owned by a certain user * Process which are Java processes * Zombie Processes
If you mention at any point ps command with arguments, be familiar with what these arguments does exactly.
What is the init process?
It is the first process executed by the kernel during the booting of a system. It is a daemon process which runs till the system is shutdown. That is why, it is the parent of all the processes
Can you describe how processes are being created?
How to change the priority of a process? Why would you want to do that?
To change the priority of a process, you can use the nice command in Linux. The nice command allows you to specify the priority of a process by assigning a priority value ranging from -20 to 19. A higher value of priority means lower priority for the process, and vice versa. You may want to change the priority of a process to adjust the amount of CPU time it is allocated by the system scheduler. For example, if you have a CPU-intensive process running on your system that is slowing down other processes, you can lower its priority to give more CPU time to other processes.
Can you explain how network process/connection is established and how it's terminated?>
When a client process on one system wants to establish a connection with a server process on another system, it first creates a socket using the socket system call. The client then calls the connect system call, passing the address of the server as an argument. This causes a three-way handshake to occur between the client and server, where the two systems exchange information to establish a connection. Once the connection is established, the client and server can exchange data using the read and write system calls. When the connection is no longer needed, the client or server can terminate the connection by calling the close system call on the socket.
What strace does? What about ltrace?
Strace is a debugging tool that is used to monitor the system calls made by a process. It allows you to trace the execution of a process and see the system calls it makes, as well as the signals it receives. This can be useful for diagnosing issues with a process, such as identifying why it is hanging or crashing. Ltrace, on the other hand, is a similar tool that is used to trace the library calls made by a process. It allows you to see the function calls made by a process to shared libraries, as well as the arguments passed to those functions. This can be useful for diagnosing issues with a process that involve library calls, such as identifying why a particular library is causing a problem.
Find all the files which end with '.yml' and replace the number 1 in 2 in each file
find /some_dir -iname \*.yml -print0 | xargs -0 -r sed -i "s/1/2/g"
You run ls and you get "/lib/ld-linux-armhf.so.3 no such file or directory". What is the problem?
The ls executable is built for an incompatible architecture.
How would you split a 50 lines file into 2 files of 25 lines each?
You can use the split command this way: split -l 25 some_file
What is a file descriptor? What file descriptors are you familiar with?
Kerberos File descriptor, also known as file handler, is a unique number which identifies an open file in the operating system. In Linux (and Unix) the first three file descriptors are: * 0 - the default data stream for input * 1 - the default data stream for output * 2 - the default data stream for output related to errors This is a great article on the topic: https://www.computerhope.com/jargon/f/file-descriptor.htm
What is NTP? What is it used for?
Explain Kernel OOM
### Security
What is chroot? In what scenarios would you consider using it?
What is SELiunx?
What is Kerberos?
What is nftables?
What firewalld daemon is responsible for?
Do you have experience with hardening servers? Can you describe the process?
How do you create a private key for a CA (certificate authority)?
One way is using openssl this way: `openssl genrsa -aes256 -out ca-private-key.pem 4096`
How do you create a public key for a CA (certificate authority)?
`openssl req -new -x509 -days 730 -key [private key file name] -sha256 -out ca.pem` If using the private key from the previous question then the command would be: `openssl req -new -x509 -days 730 -key ca-private-key.pem -sha256 -out ca.pem`
Demonstrate one way to encode and decode data in Linux
Encode: `echo -n "some password" | base64` Decode: `echo -n "allE19remO91" | base64`
### Networking
How to list all the interfaces?
``` ip link show ```
What is the loopback (lo) interface?
The loopback interface is a special, virtual network interface that your computer uses to communicate with itself. It is used mainly for diagnostics and troubleshooting, and to connect to servers running on the local machine.
What the following commands are used for? * ip addr * ip route * ip link * ping * netstat * traceroute
What is a network namespace? What is it used for?
How to check if a certain port is being used?
One of the following would work: ``` netstat -tnlp | grep lsof -i -n -P | grep ```
How can you turn your Linux server into a router?
What is a virtual IP? In what situation would you use it?
True or False? The MAC address of an interface is assigned/set by the OS
False
Can you have more than one default gateway in a given system?
Technically, yes.
What is telnet and why is it a bad idea to use it in production? (or at all)
Telnet is a type of client-server protocol that can be used to open a command line on a remote computer, typically a server. By default, all the data sent and received via telnet is transmitted in clear/plain text, therefore it should not be used as it does not encrypt any data between the client and the server.
What is the routing table? How do you view it?
How can you send an HTTP request from your shell?

Using nc is one way
What are packet sniffers? Have you used one in the past? If yes, which packet sniffers have you used and for what purpose?
It is a network utility that analyses and may inject tasks into the data-stream travelling over the targeted network.
How to list active connections?
How to trigger neighbor discovery in IPv6?
One way would be `ping6 ff02::1`
What is network interface bonding and do you know how it's performed in Linux?
What network bonding modes are there?
There a couple of modes: * balance-rr: round robing bonding * active-backup: a fault tolerance mode where only one is active * balance-tlb: Adaptive transmit load balancing * balance-alb: Adaptive load balancing
What is a bridge? How it's added in Linux OS?
### DNS
How to check what is the hostname of the system?
`cat /etc/hostname` You can also run `hostnamectl` or `hostname` but that might print only a temporary hostname. The one in the file is the permanent one.
What the file /etc/resolv.conf is used for? What does it include?
What commands are you using for performing DNS queries (or troubleshoot DNS related issues)?
You can specify one or more of the following: * dig * host * nslookup
You run dig codingshell.com and get the following result: ``` ANSWER SECTION: codingshell.com. 3515 IN A 185.199.109.153 ``` What is the meaning of the number 3515?
This is the TTL. When you lookup for an address using a domain/host name, your OS is performing DNS resolution by contacting DNS name servers to get the IP address of the host/domain you are looking for.
When you get a reply, this reply in cached in your OS for a certain period of time. This is period of time is also known as TTL and this is the meaning of 3515 number - it will be cached for 3515 seconds before removed from the cache and during that period of time, you'll get the value from the cache instead of asking DNS name servers for the address again.
How can we modify the network connection via `nmcli` command, to use `8.8.8.8` as a DNS server?
1. Find the connection name: ``` # nmcli con show NAME UUID TYPE DEVICE System ens5 8126c120-a964-e959-ff98-ac4973344505 ethernet ens5 System eth0 5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03 ethernet -- ``` Here the connection name is "System ens5". Let's say we want to modify settings for this connection. 2. Modify the connection to use 8.8.8.8 as DNS server: ``` # nmcli con mod "System ens5" ipv4.dns "8.8.8.8" ``` 3. We need to reactivate the connection for the change to take effect: ``` nmcli con up "System ens5" ``` 4. Verify our settings once more: ``` cat /etc/resolv.conf nmcli -f ipv4.dns con show "System ens5" ```
### Packaging
Do you have experience with packaging? (as in building packages) Can you explain how does it works?
How packages installation/removal is performed on the distribution you are using?
The answer depends on the distribution being used. In Fedora/CentOS/RHEL/Rocky it can be done with `rpm` or `dnf` commands. In Ubuntu it can be done with the `apt` command.
RPM: explain the spec format (what it should and can include)
How do you list the content of a package without actually installing it?
How to know to which package a file on the system belongs to? Is it a problem if it doesn't belongs to any package?
Where repositories are stored? (based on the distribution you are using)
What is an archive? How do you create one in Linux?
How to extract the content of an archive?
Why do we need package managers? Why not simply creating archives and publish them?
Package managers allow you to manage packages lifecycle as in installing, removing and updating the packages.
In addition, you can specify in a spec how a certain package will be installed - where to copy the files, which commands to run prior to the installation, post the installation, etc.
### DNF
What is DNF?
From the [repo](https://github.com/rpm-software-management/dnf): "Dandified YUM (DNF) is the next upcoming major version of YUM. It does package management using RPM, libsolv and hawkey libraries." Official [docs](https://dnf.readthedocs.io/en/latest/)
How to look for a package that provides the command /usr/bin/git? (the package isn't necessarily installed)
dnf provides /usr/bin/git
### Applications and Services
What can you find in /etc/services?
How to make sure a Service starts automatically after a reboot or crash?
Depends on the init system. Systemd: systemctl enable [service_name] System V: update-rc.d [service_name] and add this line id:5678:respawn:/bin/sh /path/to/app to /etc/inittab Upstart: add Upstart init script at /etc/init/service.conf
You run ssh 127.0.0.1 but it fails with "connection refused". What could be the problem?
1. SSH server is not installed 2. SSH server is not running
How to print the shared libraries required by a certain program? What is it useful for?
What is CUPS?
What types of web servers are you familiar with?
Nginx, Apache httpd.
### Users and Groups
What is a "superuser" (or root user)? How is it different from regular users?
How do you create users? Where user information is stored?
Command to create users is `useradd` Syntax: `useradd [options] Username` There are 2 configuration files, which stores users information 1. `/etc/passwd` - Users information like, username, shell etc is stored in this file 2. `/etc/shadow` - Users password is stored in encrypted format
Which file stores information about groups?
`/etc/groups` file stores the group name, group ID, usernames which are in secondary group.
How do you change/set the password of a user?
`passwd ` is the command to set/change password of a user.
Which file stores users passwords? Is it visible for everyone?
`/etc/shadow` file holds the passwords of the users in encrypted format. NO, it is only visible to the `root` user
Do you know how to create a new user without using adduser/useradd command?
YES, we can create new user by manually adding an entry in the `/etc/passwd` file. For example, if we need to create a user called `john`. Step 1: Add an entry to `/etc/passwd` file, so user gets created. `echo "john:x:2001:2001::/home/john:/bin/bash" >> /etc/passwd` Step 2: Add an entry to `/etc/group` file, because every user belong to the primary group that has same name as the username. `echo "john:x:2001:" >> /etc/group` Step 3: Verify if the user got created `id john`
What information is stored in /etc/passwd? explain each field
`/etc/passwd` is a configuration file, which contains users information. Each entry in this file has, 7 fields, `username:password:UID:GID:Comment:home directory:shell` `username` - The name of the user. `password` - This field is actually a placeholder of the password field. Due to security concerns, this field does not contain the password, just a placeholder (x) to the encrypted password stored in `/etc/shadow` file. `UID` - User ID of the user. `GID` - Group ID `Comment` - This field is to provide description about the user. `home directory` - Abousulte path of the user's home directory. This directory gets created once the user is added. `shell` - This field contains the absolute path of the shell that will be used by the respective user.
How to add a new user to the system without providing him the ability to log-in into the system?
`adduser user_name --shell=/bin/false --no-create-home` You can also add a user and then edit /etc/passwd.
How to switch to another user? How to switch to the root user?
su command. Use su - to switch to root
What is the UID the root user? What about a regular user?
UID of root user is 0 Default values of UID_MIN and UID_MAX in `/etc/login.defs` `UID_MIN` is `1000` `UID_MAX` is `60000` Actually, we can change this value. But UID < 1000 are reserved for system accounts. Therefore, as per the default configuration, for regular user UID starts from `1000`.
What can you do if you lost/forogt the root password?
Re-install the OS IS NOT the right answer :)
What is /etc/skel?
`/etc/skel` is a directory, that contains files or directories, so when a new user is created, these files/directories created under `/etc/skel` will be copied to user's home directory.
How to see a list of who logged-in to the system?
Using the `last` command.
Explain what each of the following commands does: * useradd * usermod * whoami * id
`useradd` - Command for creating new users `usermod` - Modify the users setting `whoami` - Outputs, the username that we are currently logged in `id` - Prints the
You run grep $(whoami) /etc/passwd but the output is empty. What might be a possible reason for that?
The user you are using isn't defined locally but originates from services like LDAP.
You can verify with: `getent passwd`
### Hardware
Where can you find information on the processor (like number of CPUs)?
/proc/cpuinfo You can also use `nproc` for number of processors
How can you print information on the BIOS, motherboard, processor and RAM?
dmidecoode
How can you print all the information on connected block devices in your system?
lsblk
True or False? In user space, applications don't have full access to hardware resources
True. Only in kernel space they have full access to hardware resources.
### Namespaces
What types of namespaces are there in Linux?
- Process ID namespaces: these namespaces include independent set of process IDs - Mount namespaces: Isolation and control of mountpoints - Network namespaces: Isolates system networking resources such as routing table, interfaces, ARP table, etc. - UTS namespaces: Isolate host and domains - IPC namespaces: Isolates interprocess communications - User namespaces: Isolate user and group IDs - Time namespaces: Isolates time machine
True or False? In every PID (Process ID) namespace the first process assigned with the process id number 1
True. Inside the namespace it's PID 1 while to the parent namespace the PID is a different one.
True or False? In a child PID namespace all processes are aware of parent PID namespace and processes and the parent PID namespace has no visibility of child PID namespace processes
False. The opposite is true. Parent PID namespace is aware and has visibility of processes in child PID namespace and child PID namespace has no visibility as to what is going on in the parent PID namespace.
True or False? By default, when creating two separate network namespaces, a ping from one namespace to another will work fine
False. Network namespace has its own interfaces and routing table. There is no way (without creating a bridge for example) for one network namespace to reach another.
True or False? With UTS namespaces, processes may appear as if they run on different hosts and domains while running on the same host
True
True or False? It's not possible to have a root user with ID 0 in child user namespaces
False. In every child user namespace, it's possible to have a separate root user with uid of 0.
What time namespaces are used for?
In time namespaces processes can use different system time.
### Virtualization
What virtualization solutions are available for Linux?
* [KVM](https://www.linux-kvm.org/page/Main_Page) * [XEN](http://www.xen.org/) * [VirtualBox](https://www.virtualbox.org/) * [Linux-VServer](http://linux-vserver.org/Welcome_to_Linux-VServer.org) * [User-mode Linux](http://user-mode-linux.sourceforge.net/) * ...
What is KVM?
Is an open source virtualization technology used to operate on x86 hardware. From the official [docs](https://www.linux-kvm.org/page/Main_Page) Recommended read: * [Red Hat Article - What is KVM?](https://www.redhat.com/en/topics/virtualization/what-is-KVM)
What is Libvirt?
It's an open source collection of software used to manage virtual machines. It can be used with: KVM, Xen, LXC and others. It's also called Libvirt Virtualization API. From the official [docs](https://libvirt.org/) Hypervisor supported [docs](https://libvirt.org/drivers.html)
### AWK
What the awk command does? Have you used it? What for?
From Wikipedia: "AWK is domain-specific language designed for text processing and typically used as a data extraction and reporting tool"
How to print the 4th column in a file?
`awk '{print $4}' file`
How to print every line that is longer than 79 characters?
`awk 'length($0) > 79' file`
What the lsof command does? Have you used it? What for?
What is the difference between find and locate?
How a user process performs a privileged operation, such as reading from the disk?
Using system calls
### System Calls
What is a system call? What system calls are you familiar with?
How a program executes a system call?
- A program executes a trap instruction. The instruction jump into the kernel while raising the privileged level to kernel space. - Once in kernel space, it can perform any privileged operation - Once it's finished, it calls a "return-from-trap" instruction which returns to user space while reducing back the privilege level to user space.
Explain the fork() system call
fork() is used for creating a new process. It does so by cloning the calling process but the child process has its own PID and any memory locks, I/O operations and semaphores are not inherited.
What is the return value of fork()?
- On success, the PID of the child process in parent and 0 in child process - On error, -1 in the parent
Name one reason for fork() to fail
Not enough memory to create a new process
Why do we need the wait() system call?
wait() is used by a parent process to wait for the child process to finish execution. If wait is not used by a parent process then a child process might become a zombie process.
How the kernel notifies the parent process about child process termination?
The kernel notifies the parent by sending the SIGCHLD to the parent.
How the waitpid() is different from wait()?
The waitpid() is a non-blocking version of the wait() function.
It also supports using library routine (e.g. system()) to wait a child process without messing up with other children processes for which the process has not waited.
True or False? The wait() system call won't return until the child process has run and exited
True in most cases though there are cases where wait() returns before the child exits.
Explain the exec() system call
It transforms the current running program into another program.
Given the name of an executable and some arguments, it loads the code and static data from the specified executable and overwrites its current code segment and current static code data. After initializing its memory space (like stack and heap) the OS runs the program passing any arguments as the argv of that process.
True or False? A successful call to exec() never returns
True
Since a successful exec replace the current process, it can't return anything to the process that made the call.
What system call is used for listing files?
What system calls are used for creating a new process?
fork(), exec() and the wait() system call is also included in this workflow.
What execve() does?
Executes a program. The program is passed as a filename (or path) and must be a binary executable or a script.
What is the return value of malloc?
Explain the pipe() system call. What does it used for?
[Unix pipe implementation](https://toroid.org/unix-pipe-implementation) "Pipes provide a unidirectional interprocess communication channel. A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe. A pipe is created using pipe(2), which returns two file descriptors, one referring to the read end of the pipe, the other referring to the write end."
What happens when you execute ls -l?
* Shell reads the input using getline() which reads the input file stream and stores into a buffer as a string * The buffer is broken down into tokens and stored in an array this way: {"ls", "-l", "NULL"} * Shell checks if an expansion is required (in case of ls *.c) * Once the program in memory, its execution starts. First by calling readdir() Notes: * getline() originates in GNU C library and used to read lines from input stream and stores those lines in the buffer
What happens when you execute ls -l *.log?
What readdir() system call does?
What exactly the command alias x=y does?
Why running a new program is done using the fork() and exec() system calls? why a different API wasn't developed where there is one call to run a new program?
This way provides a lot of flexibility. It allows the shell for example, to run code after the call to fork() but before the call to exec(). Such code can be used to alter the environment of the program it about to run.
Describe shortly what happens when you execute a command in the shell
The shell figures out, using the PATH variable, where the executable of the command resides in the filesystem. It then calls fork() to create a new child process for running the command. Once the fork was executed successfully, it calls a variant of exec() to execute the command and finally, waits the command to finish using wait(). When the child completes, the shell returns from wait() and prints out the prompt again.
### Filesystem & Files
How to create a file of a certain size?
There are a couple of ways to do that: * dd if=/dev/urandom of=new_file.txt bs=2MB count=1 * truncate -s 2M new_file.txt * fallocate -l 2097152 new_file.txt
What does the following block do?: ``` open("/my/file") = 5 read(5, "file content") ```
These system calls are reading the file /my/file and 5 is the file descriptor number.
Describe three different ways to remove a file (or its content)
What is the difference between a process and a thread?
What is context switch?
From [wikipedia](https://en.wikipedia.org/wiki/Context_switch): a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point
You found there is a server with high CPU load but you didn't find a process with high CPU. How is that possible?
### Advanced Networking
When you run ip a you see there is a device called 'lo'. What is it and why do we need it?
What the traceroute command does? How does it works?
Another common way to task this questions is "what part of the tcp header does traceroute modify?"
What is network bonding? What types are you familiar with?
How to link two separate network namespaces so you can ping an interface on one namespace from the second one?
What are cgroups?
Explain Process Descriptor and Task Structure
What are the differences between threads and processes?
Explain Kernel Threads
What happens when socket system call is used?
This is a good article about the topic: https://ops.tips/blog/how-linux-creates-sockets
You executed a script and while still running, it got accidentally removed. Is it possible to restore the script while it's still running?
It is possible to restore a script while it's still running if it has been accidentally removed. The running script process still has the code in memory. You can use the /proc filesystem to retrieve the content of the running script. 1.Find the Process ID by running ``` ps aux | grep yourscriptname.sh ``` Replace yourscriptname.sh with your script name. 2.Once you have the PID, you can access the script's memory through the /proc filesystem. The script will be available at /proc//fd/, where is the process ID of the running script. Typically, the script's file descriptor is 0 or 1. You can copy the script content to a new file using the cp command: ``` cp /proc//fd/0 /path_to_restore_your_file/yourscriptname.sh ``` Replace with the actual PID of the script and /path_to_restore_your_file/yourscriptname.sh with the path where you want to restore the script.
### Memory
What is the difference between MemFree and MemAvailable in /proc/meminfo?
MemFree - The amount of unused physical RAM in your system MemAvailable - The amount of available memory for new workloads (without pushing system to use swap) based on MemFree, Active(file), Inactive(file), and SReclaimable.
What is the difference between paging and swapping?
Explain what is OOM killer
### Distributions
What is a Linux distribution?
What Linux distributions are you familiar with?
What are the components of a Linux distribution?
* Kernel * Utilities * Services * Software/Packages Management
### Sed
Using sed, extract the date from the following line: 201.7.19.90 - - [05/Jun/1985:13:42:99 +0000] "GET /site HTTP/1.1" 200 32421
`echo $line | sed 's/.*\[//g;s/].*//g;s/:.*//g'`
### Misc
What is a Linux distribution?
* A collection of packages - kernel, GNU, third party apps, ... * Sometimes distributions store some information on the distribution in `/etc/*-release` file * For example for Red Hat distribution it will be `/etc/redhat-release` and for Amazon it will be `/etc/os-release` * `lsb_release` is a common command you can use in multiple different distributions
Name 5 commands which are two letters long
ls, wc, dd, df, du, ps, ip, cp, cd ...
What ways are there for creating a new empty file?
* touch new_file * echo "" > new_file
How `cd -` works? How does it knows the previous location?
$OLDPWD
List three ways to print all the files in the current directory
* ls * find . * echo *
How to count the number of lines in a file? What about words?
For these we can use `wc` command. 1. To count the number of lines in file ```wc -l``` 2. To count the number of words in file ```wc -w```
You define x=2 in /etc/bashrc and x=6 ~/.bashrc you then login to the system. What would be the value of x?
What is the difference between man and info?
A good answer can be found [here](https://askubuntu.com/questions/9325/what-is-the-difference-between-man-and-info-documentation)
Explain "environment variables". How do you list all environment variables?
What is a TTY device?
How to create your own environment variables?
`X=2` for example. But this will persist to new shells. To have it in new shells as well, use `export X=2`
What a double dash (--) mean?
It's used in commands to mark the end of commands options. One common example is when used with git to discard local changes: `git checkout -- some_file`
Wildcards are implemented on user or kernel space?
If I plug a new device into a Linux machine, where on the system, a new device entry/file will be created?
/dev
Why there are different sections in man? What is the difference between the sections?
What is User-mode Linux?
In Linux, user mode is a restricted operating mode in which a user's application or process runs. User mode is a non-privileged mode that prevents user-level processes from accessing sensitive system resources directly. In user mode, an application can only access hardware resources indirectly, by calling system services or functions provided by the operating system. This ensures that the system's security and stability are maintained by preventing user processes from interfering with or damaging system resources. Additionally, user mode also provides memory protection to prevent applications from accessing unauthorized memory locations. This is done by assigning each process its own virtual memory space, which is isolated from other processes. In contrast to user mode, kernel mode is a privileged operating mode in which the operating system's kernel has full access to system resources, and can perform low-level operations, such as accessing hardware devices and managing system resources directly.
Under which license Linux is distributed?
GPL v2
================================================ FILE: topics/linux/exercises/copy/README.md ================================================ # Copy Time ## Objectives 1. Create an empty file called `x` in `/tmp` 2. Copy the `x` file you created to your home directory 3. Create a copy of `x` file called `y` 4. Create a directory called `files` and move `x` and `y` there 5. Copy the directory "files" and name the copy `copy_of_files` 6. Rename `copy_of_files` directory to `files2` 7. Remove `files` and `files2` directories ## Solution Click [here](solution.md) to view the solution. ================================================ FILE: topics/linux/exercises/copy/solution.md ================================================ # Copy Time ## Objectives 1. Create an empty file called `x` in `/tmp` 2. Copy the `x` file you created to your home directory 3. Create a copy of `x` file called `y` 4. Create a directory called `files` and move `x` and `y` there 5. Copy the directory "files" and name the copy `copy_of_files` 6. Rename `copy_of_files` directory to `files2` 7. Remove `files` and `files2` directories ## Solution ``` touch /tmp/x cp x ~/ cp x y mkdir files mv x files | mv y files cp -r files copy_of_files mv copy_of_files files2 rm -rf files files2 ``` ================================================ FILE: topics/linux/exercises/create_remove/README.md ================================================ # Create & Destroy ## Objectives 1. Create a file called `x` 2. Create a directory called `content` 3. Move `x` file to the `content` directory 4. Create a file inside the `content` directory called `y` 5. Create the following directory structure in `content` directory: `dir1/dir2/dir3` 6. Remove the content directory ## Solution Click [here](solution.md) to view the solution. ================================================ FILE: topics/linux/exercises/create_remove/solution.md ================================================ # Create & Destroy ## Objectives 1. Create a file called `x` 2. Create a directory called `content` 3. Move `x` file to the `content` directory 4. Create a file inside the `content` directory called `y` 5. Create the following directory structure in `content` directory: `dir1/dir2/dir3` 6. Remove the content directory ## Solution ``` touch x mkdir content mv x content touch content/y mkdir -p content/dir1/dir2/dir3 rm -rf content ``` ================================================ FILE: topics/linux/exercises/navigation/README.md ================================================ # Navigation ## Requirements 1. Linux :) ## Objectives 1. Change directory to `/tmp` 2. Move to parent directory 3. Change directory to home directory 4. Move to parent directory 5. Move again to parent directory 1. Where are you at? Verify with a command 6. Change to last visited directory ## Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/linux/exercises/navigation/solution.md ================================================ # Navigation ## Requirements 1. Linux :) ## Objectives 1. Change directory to `/tmp` 2. Move to parent directory 3. Change directory to home directory 4. Move to parent directory 5. Move again to parent directory 1. Where are you at? Verify with a command 6. Change to last visited directory ## Solution ``` cd /tmp cd .. cd ~ cd .. cd .. # root (/) pwd cd - ``` ================================================ FILE: topics/linux/exercises/uniqe_count/README.md ================================================ # Unique Count ## Objectives In this directory you have a file with list of IP addresses called `ip_list`. Using the file, determine which IP address is the most recurring (listed the most times). # Solution Click [here](solution.md) to view the solution ================================================ FILE: topics/linux/exercises/uniqe_count/ip_list ================================================ 88.249.123.246 GET 200 204.14.121.43 GET 200 49.11.110.6 GET 200 137.126.109.160 GET 200 107.13.168.5 GET 200 232.136.91.101 GET 404 43.140.194.174 GET 200 137.126.109.160 GET 200 63.219.2.31 GET 200 17.86.6.109 GET 200 119.89.134.53 GET 404 137.126.109.160 GET 200 204.14.121.43 GET 200 238.183.3.55 GET 200 254.62.231.49 GET 200 250.1.145.213 POST 200 17.86.6.109 GET 404 119.89.134.53 POST 200 88.249.123.246 GET 200 49.11.110.61 GET 200 107.13.168.5 GET 504 232.136.91.101 GET 200 137.126.109.160 POST 200 63.219.2.31 GET 200 17.86.6.109 GET 200 119.89.134.53 GET 504 17.86.6.109 GET 200 197.1.166.141 GET 200 17.86.6.109 GET 200 87.21.188.245 GET 504 235.230.62.243 GET 200 246.3.48.149 GET 200 194.131.205.190 GET 504 222.129.41.212 POST 200 224.57.91.248 GET 504 238.183.3.55 GET 200 137.126.109.160 GET 504 254.62.231.49 POST 200 250.1.145.213 GET 504 185.80.235.15 GET 200 137.126.109.160 GET 200 63.219.2.31 GET 504 17.86.6.109 GET 200 119.89.134.53 POST 200 63.219.2.31 GET 504 17.86.6.109 GET 200 119.89.134.53 GET 504 88.249.123.246 GET 200 238.183.3.55 POST 200 224.57.91.248 GET 504 238.183.3.55 POST 200 254.62.231.49 GET 200 254.62.231.49 POST 404 250.1.145.213 GET 200 221.169.255.179 GET 200 220.35.213.247 GET 200 67.89.94.133 GET 200 77.192.163.242 POST 200 204.14.121.43 GET 200 22.244.145.46 GET 200 89.127.55.7 GET 200 137.126.109.160 GET 200 88.249.123.246 POST 200 238.183.3.55 GET 200 254.62.231.49 GET 200 250.1.145.213 GET 200 137.126.109.160 POST 200 221.169.255.179 GET 200 232.136.91.101 GET 200 197.1.166.141 GET 200 87.21.188.245 GET 200 235.230.62.243 GET 200 246.3.48.149 GET 200 194.131.205.190 GET 200 222.129.41.212 GET 200 137.126.109.160 GET 200 224.57.91.248 GET 200 185.80.235.15 GET 200 137.126.109.160 GET 200 63.219.2.31 GET 200 17.86.6.109 GET 200 119.89.134.53 GET 200 88.249.123.246 GET 200 238.183.3.55 GET 200 254.62.231.49 GET 200 250.1.145.213 GET 200 63.219.2.31 GET 200 17.86.6.109 GET 200 119.89.134.53 POST 200 88.249.123.246 GET 200 137.126.109.160 POST 200 238.183.3.55 GET 200 254.62.231.49 POST 200 250.1.145.213 GET 200 137.126.109.160 POST 200 63.219.2.31 GET 200 17.86.6.109 GET 404 107.13.168.5 POST 200 232.136.91.101 GET 200 137.126.109.160 POST 200 63.219.2.31 GET 200 17.86.6.109 GET 200 197.1.166.141 GET 200 87.21.188.245 POST 200 235.230.62.243 POST 200 246.3.48.149 GET 200 194.131.205.190 GET 200 224.57.91.248 GET 200 238.183.3.55 POST 200 254.62.231.49 GET 200 88.249.123.246 GET 200 49.11.110.61 GET 200 107.13.168.5 POST 200 232.136.91.101 GET 200 204.14.121.43 POST 200 ================================================ FILE: topics/linux/exercises/uniqe_count/solution.md ================================================ # Unique Count ## Objectives In this directory you have a file with list of IP addresses called `ip_list`. Using the file, determine which IP address is the most recurring (listed the most times). # Solution `sort ip_list | cut -d' ' -f1 | uniq -c | sort -n | tail -1` ================================================ FILE: topics/misc/elk_kibana_aws.md ================================================ # Elasticsearch, Kibana and AWS Your task is to build an elasticsearch cluster along with Kibana dashboard on one of the following clouds: * AWS * OpenStack * Azure * GCP You have to describe in details (preferably with some drawings) how you are going to set it up. Please describe in detail: - How you scale it up or down - How you quickly (less 20 minutes) provision the cluster - How you apply security policy for access control - How you transfer the logs from the app to ELK - How you deal with multi apps running in different regions # Solution This one out of many possible solutions. This solution is relying heavily on AWS. * Create a VPC with subnet so we can place Elasticsearch node(s) in internal environment only. If required, we will also setup NAT for public access. * Create an IAM role for the access to the cluster. Also, create a separate role for admin access. * To provision the solution quickly, we will use the elasticsearch service directly from AWS for production deployment. This way we also cover multiple AZs. As for authentication, we either use Amazon cognito or the organization LDAP server. * To transfer data, we will have to install logstash agent on the instances. The agent will be responsible for pushing the data to elasticsearch. * For monitoring we will use: * Cloud watch to monitor cluster resource utilization * Cloud metrics dashboard * If access required from multiple regions we will transfer all the data to S3 which will allow us to view the data from different regions and consolidate it in one dashboard ================================================ FILE: topics/node/node_questions_basic.md ================================================ # NODEJS BASIC INTERVIEW QUESTIONS # OBJECTIVE To Tell about the basic questions asked in node to me in many interviews 1. What is Nodejs ? 2. How many threads does nodejs have ? 3. How do nodejs work ? 4. Is nodejs Single Threaded Or Multi Threaded ? 5. what is node cluster ? 6. Does parent process depends on the child preocess ? 7. How many types of module do nodejs have ? 8. Why nodejs ? 9. What is npm ? 10. Difference between pacakage.json and pacakage-lock.json ? 11. What is the difference betwwen creating a server with http and a framework ? 12. What do you mean by non-blocking ? 13. What is event loop ? 14. What is event driven ? ================================================ FILE: topics/node/solutions/node_questions_basic_ans.md ================================================ # ANSWERS 1. Node.js is an open-source, cross-platform JavaScript runtime environment that allows developers to build server-side and networking applications. 2. Nodejs is a single threaded langauage . It handles one operation at a time. 3. Node.js works by executing JavaScript code in a runtime environment outside of a web browser. 4. Node.js works by executing JavaScript code in a runtime environment outside of a web browser. mainly used for performance and scalabilty od the project 5. Parent process manages the child process but not depend on the clid process to run parralel 6. Three Modules mainly 1. Core Module - fs , require 2. Local modules - like function created by us and exported or imported from one file to another 3. Third Party module - like npm pacakages whcih we install to do a specific kind of work 7. NPM (Node Pacakage Manager) used for installing, managing, and sharing JavaScript packages and dependencies. 8. Difference between pacakage.json and pacakage-lock.json 1.pacakage.json - contains the metadata and the dependendies of a project 2.pacakage-lock.json - lock the version of the installed dependencies ================================================ FILE: topics/observability/README.md ================================================ # Observability - [Observability](#observability) - [Monitoring](#monitoring) - [Data](#data) - [Application Performance Management](#application-performance-management)
What's Observability?
In distributed systems, observability is the ability to collect data about programs' execution, modules' internal states, and the communication among components.
To improve observability, software engineers use a wide range of logging and tracing techniques to gather telemetry information, and tools to analyze and use it.
Observability is foundational to site reliability engineering, as it is the first step in triaging a service outage.[1]
## Monitoring
What's monitoring? How is it related to Observability?
Google: "Monitoring is one of the primary means by which service owners keep track of a system’s health and availability".
What types of monitoring outputs are you familiar with and/or used in the past?
Alerts
Tickets
Logging
## Data
Can you mention what type of things are often montiored in the IT industry?
- Hardware (CPU, RAM, ...) - Infrastructure (Disk capacity, Network latency, ...) - App (Status code, Errors in logs, ...)
Explain "Time Series" data
Time series data is sequenced data, measuring certain parameter in ordered (by time) way. An example would be CPU utilization every hour: ``` 08:00 17 09:00 22 10:00 91 ```
Explain data aggregation
In monitoring, aggregating data is basically combining collection of values. It can be done in different ways like taking the average of multiple values, the sum of them, the count of many times they appear in the collection and other ways that mainly depend on the type of the collection (e.g. time-series would be one type).
## Application Performance Management
What is Application Performance Management?
- IT metrics translated into business insights - Practices for monitoring applications insights so we can improve performances, reduce issues and improve overall user experience
Name three aspects of a project you can monitor with APM (e.g. backend)
- Frontend - Backend - Infra - ...
What can be collected/monitored to perform APM monitoring?
- Metrics - Logs - Events - Traces
================================================ FILE: topics/openshift/README.md ================================================ ## OpenShift ### OpenShift Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| | Projects 101 | Projects | [Exercise](projects_101.md) | [Solution](solutions/projects_101.md) | My First Application | Applications | [Exercise](my_first_app.md) | [Solution](solutions/my_first_app.md) ### OpenShift Self Assessment * [OpenShift 101](#Openshift-101) * [OpenShift Architecture](#Openshift-architecture) * [OpenShift Hands-On Basics](#Openshift-hands-on-basics) * [OpenShift Projects](#Openshift-projects) ### OpenShift 101
What is OpenShift?
OpenShift is a container orchestration platform based on Kubernetes.
It can be used for deploying applications while having minimal management overhead.
How OpenShift is related to Kubernetes?
OpenShift is build on top of Kubernetes while defining its own custom resources in addition to the built-in resources.
True or False? OpenShift is a IaaS (infrastructure as a service) solution
False. OpenShift is a PaaS (platform as a service) solution.
True or False? OpenShift CLI supports everything kubectl supports, along with additional functionality
True
What are some of OpenShift added features on top of Kubernetes?
- UI: OpenShift provides unified UI out-of-the-box - Routes: Simple procedure for exposing services - Developer Workflow Support: built-in CI/CD (openshift pipelines), built-in container registry and tooling for building artifacts from source to container images
True or False? To run containers on OpenShift, you have to own root privileges
False. OpenShift supports rootless containers by default.
## OpenShift - Architecture
What types of nodes OpenShift has?
- Workers: Where the end-user applications are running - Masters: Responsible for managing the cluster
Which component responsible for determining pod placement?
The Scheduler.
What else the scheduler responsible for except pod placement?
Application high availability by spreading pod replicas between worker nodes
## OpenShift - Hands-On Basics
OpenShift supports many resources. How to get a list of all these resources?
`oc api-resources`
Explain OpenShift CLIs like oc and odo
oc is used for creating applications, but also for administrating OpenShift cluster
odo is used solely for managing applications on OpenShift (mainly from developers' perspective) and has nothing to do with administrating the cluster
## OpenShift - Projects
What is a project in OpenShift?
A project in OpenShift is a Kubernetes namespace with annotations.
In simpler words, think about it as an isolated environment for users to manage and organize their resources (like Pods, Deployments, Service, etc.).
How to list all projects? What the "STATUS" column means in projects list output?
`oc get projects` will list all projects. The "STATUS" column can be used to see which projects are currently active.
You have a new team member and you would like to assign to him the "admin" role on your project in OpenShift. How to achieve that?
`oc adm policy add-role-to-user -n `
#### OpenShift - Applications
How to create a MySQL application using an image from Docker Hub?
`oc new-app mysql`
#### OpenShift - Images
What is an image stream?
What would be the best way to run and manage multiple OpenShift environments?
Federation
#### OpenShift - Federation
What is OpenShift Federation?
Management and deployment of services and workloads across multiple independent clusters from a single API
Explain the following in regards to Federation: * Multi Cluster * Federated Cluster * Host Cluster * Member Cluster
* Multi Cluster - Multiple clusters deployed independently, not being aware of each other * Federated Cluster - Multiple clusters managed by the OpenShift Federation Control Plane * Host Cluster - The cluster that runs the Federation Control Plane * Member Cluster - Cluster that is part of the Federated Cluster and connected to Federation Control Plane
## OpenShift - Storage
What is a storage device? What storage devices are there?
* Hard Disks * SSD * USB * Magnetic Tape
What is Random Seek Time?
The time it takes for a disk to reach the place where the data is located and read a single block/sector. Bones question: What is the random seek time in SSD and Magnetic Disk? Answer: Magnetic is about 10ms and SSD is somewhere between 0.08 and 0.16ms
#### OpenShift - Pods
What happens when a pod fails or exit due to container crash
Master node automatically restarts the pod unless it fails too often.
What happens when a pod fails too often?
It's marked as bad by the master node and temporary not restarted anymore.
How to find out on which node a certain pod is running?
`oc get po -o wide`
#### OpenShift - Services
Explain Services and their benefits
- Services in OpenShift define access policy to one or more set of pods.
- They are connecting applications together by enabling communication between them - They provide permanent internal IP addresses and hostnames for applications - They are able to provide basic internal load balancing
#### OpenShift - Labels
Explain labels. What are they? When do you use them?
- Labels are used to group or select API objects - They are simple key-value pairs and can be included in metadata of some objects - A common use case: group pods, services, deployments, ... all related to a certain application
#### OpenShift - Service Accounts
How to list Service Accounts?
`oc get serviceaccounts`
#### OpenShift - Networking
What is a Route?
A route is exposing a service by giving it hostname which is externally reachable
What Route is consists of?
- name - service selector - (optional) security configuration
True or False? Router container can run only on the Master node
False. It can run on any node.
Given an example of how a router is used
1. Client is using an address of application running on OpenShift 2. DNS resolves to host running the router 3. Router checks whether route exists 4. Router proxies the request to the internal pod
#### OpenShift - Security
What are "Security Context Constraints"?
From [OpenShift Docs](https://docs.openshift.com/container-platform/4.7/authentication/managing-security-context-constraints.html): "Similar to the way that RBAC resources control user access, administrators can use security context constraints (SCCs) to control permissions for pods".
How to add the ability for the user `user1` to view the project `wonderland` assuming you are authorized to do so
oc adm policy add-role-to-user view user1 -n wonderland
How to check what is the current context?
`oc whoami --show-context`
#### OpenShift - Serverless
What is OpenShift Serverless?
- In general 'serverless' is a cloud computing model where scaling and provisioning is taken care for application developers, so they can focus on the development aspect rather infrastructure related tasks - OpenShift Serverless allows you to dynamically scale your applications and provides the ability to build event-driven applications, whether the sources are on Kubernetes, the cloud or on-premise solutions - OpenShift Serverless is based on the Knative project.
What are some of the event sources you can use with OpenShift Serverless?
* Kafka * Kubernetes APIs * AWS Kinesis * AWS SQS * JIRA * Slack More are supported and provided with OpenShift.
Explain serverless functions
What is the difference between Serverless Containers and Serverless functions?
#### OpenShift - Misc
What is Replication Controller?
Replication Controller responsible for ensuring the specified number of pods is running at all times.
If more pods are running than needed -> it deletes some of them
If not enough pods are running -> it creates more
================================================ FILE: topics/openshift/projects_101.md ================================================ ## OpenShift - Projects 101 ### Objectives In a newly deployed cluster (preferably) perform the following: 1. Log in to the OpenShift cluster 2. List all the projects 3. Create a new project called 'neverland' 4. Check the overview status of the current project ================================================ FILE: topics/openshift/solutions/my_first_app.md ================================================ ## OpenShift - My First Application ### Objectives 1. Create a MySQL application 2. Describe which OpenShift objects were created ### Solution 1. `oc new-app mysql` 2. The following objects were created: * ImageStream: ================================================ FILE: topics/openshift/solutions/projects_101.md ================================================ ## OpenShift - Projects 101 ### Objectives In a newly deployed cluster (preferably) perform the following: 1. Login to the OpenShift cluster 2. List all the projects 3. Create a new project called 'neverland' 4. Check the overview status of the current project ### Solution ``` oc login -u YOUR_USER -p YOUR_PASSWORD_OR_TOKEN oc get projects # Empty output in new cluster oc new-project neverland oc status ``` ================================================ FILE: topics/os/fork_101.md ================================================ ## Fork 101 Answer the questions given the following program (without running it): ``` #include #include int main() { fork(); printf("\nyay\n"); return 0; } ``` 1. How many times the word "yay" will be printed? 2. How many processes will be created? ================================================ FILE: topics/os/fork_102.md ================================================ ## Fork 102 Answer the questions given the following program (without running it): ``` #include #include int main() { fork(); fork(); printf("\nyay\n"); return 0; } ``` 1. How many times the word "yay" will be printed? 2. How many processes will be created? ================================================ FILE: topics/os/solutions/fork_101_solution.md ================================================ ## Fork 101 - Solution 1. 2 2. 2 ================================================ FILE: topics/os/solutions/fork_102_solution.md ================================================ ## Fork 102 - Solution 1. 4 2. 4 ================================================ FILE: topics/perl/README.md ================================================ ## Perl ### Perl Self Assessment
What is Perl?
From the official [docs](https://perldoc.perl.org/): "Perl officially stands for Practical Extraction and Report Language, except when it doesn't." It's a general purpose programming language developed for manipulating texts mainly. It has been used to perform system administration tasks, networking, building websites and more.
What data types Perl has? And how can we define it?
- Scalar: This is a simple variable that stores single data items. It can be a string, number or reference. ``` my $number = 5; ``` - Arrays: This is a list of scalars. ``` my @numbers = (1, 2, 3, 4, 5); # or using the `qw` keyword (quote word): my @numbers = qw/1 2 3 4 5/; # '/' can be another symbol, e.g qw@1 2 3 4 5@ ``` - Hashes (or associative arrays): This is an unordered collection of key-value pairs. We can access to a hash using the keys. ``` my %numbers = ( First => '1', Second => '2', Third => '3' ); ```
How can you access to a hash value, add and delete a key/value pair and modify a hash?
``` my %numbers = ( 'First' => '1', 'Second' => '2', 'Third' => '3' ); ``` - Access: ``` print($numbers{'First'}); ``` - Add: ``` $numbers{'Fourth'} = 4; ``` - Delete: ``` delete $numbers{'Third'}; ``` - Modify: ``` $numbers{'Fifth'} = 6; $numbers{'Fifth'} = 5; ```
How can you iterate an array? And a hash?
- Array: ``` my @numbers = qw/1 2 3 4 5/; # Using `$_` that represents the current iteration in a loop. It starts from index array 0 until the last index. foreach (@numbers) { print($_); } # Output: 12345 # "$#" returns the max index of an array. That's the reason because we can iterate accessing to the array from the index 0 to the max index. for my $i (0..$#numbers) { print($numbers[$i]); } # Output: 12345 # Using the `map` keyword: print map {$_} @numbers; # Output: 12345 # Using `while`. We should take care with this option. When we use `shift` we're deleting the first element of the array and assigning it to the `element` variable. # After this `loop` the `numbers` array will not have elements. while (my $element = shift(@numbers)) { print($element); } # Output: 12345 ``` - Hashes: ``` my %capital_cities = ( 'Madrid' => 'Spain', 'Rome' => 'Italy', 'Berlin' => 'Germany' ); # Iterate and get the `keys`: foreach my $city (keys %capital_cities) { print($city . "\n"); } # Iterate and get the `values`: foreach my $country (values %capital_cities) { print($country . "\n"); } # Iterate and get the values and keys (first option): foreach my $city (keys %capital_cities) { print("City: $city - Country: $capital_cities{$city}" . "\n"); } # Iterate and get the values and keys (first option): while(my ($city, $country) = each %capital_cities) { print("City: $city - Country: $capital_cities{$city}" . "\n"); } ```
What is a Perl subroutine? How to define it?
It's the perl model for user defined functions (this is also called function like other programming languages). We can define a subroutine with the keyword `sub`. ``` sub hello { print "hello"; } ```
Describe the different ways to receive parameters in a subroutine
- List assignment: Using the `@_` array. It's a list with the elements that are being passed as parameters. ``` sub power { my ($b, $e) = @_; return $b ** $e; } &power(2, 3); ``` - Individual assignment: We should access to every element of the `@_` array. It starts from zero. ``` sub power { my $b = $_[0]; my $e = $_[1]; return $b ** $e; } &power(2, 3); ``` - Using `shift` keyword: It's used to remove the first value of an array and it's returned. ``` sub power { my $b = shift; my $3 = shift; return $b ** $e; } &power(2, 3); ``` [Source](https://stackoverflow.com/a/21465275/12771230) We can also read the best way in the same S.O answer.
What is lexical and dynamic scoping?
How to apply referencing and dereferencing?
Does Perl have conventions?
You can check [perlstyle](https://perldoc.perl.org/perlstyle)
What is Perl POD? Can you code an example?
From the official [docs](https://perldoc.perl.org/perlpod): "Pod is a simple-to-use markup language used for writing documentation for Perl, Perl programs, and Perl modules." ``` =item This function returns the factorial of a number. Input: $n (number you wanna calculate). Output: number factorial. =cut sub factorial { my ($i, $result, $n) = (1, 1, shift); $result = $result *= $i && $i++ while $i <= $n; return $result; } ```
### Perl Regex
Check if the word `electroencefalografista` exists in a string
``` my $string = "The longest accepted word by RAE is: electroencefalografista"; if ($string =~ /electroencefalografista/) { print "Match!"; } ```
Check if the word `electroencefalografista` does not exists in a string
``` my $string = "The longest not accepted word by RAE is: Ciclopentanoperhidrofenantreno"; if ($string !~ /electroencefalografista/) { print "Does not match!"; } ```
Replace the word `amazing`
``` my $string = "Perl is amazing!"; $string =~ s/amazing/incredible/; print $string; # Perl is incredible! ```
Extract `hh:mm:ss` with capturing group `()` in the following datetime
``` my $date = "Fri Nov 19 20:09:37 CET 2021"; my @matches = $date =~ /(.*)(\d{2}:\d{2}:\d{2})(.*)/; print $matches[1]; # Output: 20:09:37 ```
Extract all the elements that are numbers in an array
``` my @array = ('a', 1, 'b', 2, 'c', 3); my @numbers = grep (/\d/, @array); # Note: \d involves more digits than 0-9 map {print $_ . "\n" } @numbers; ```
Print all the linux system users that starts with d or D
- With a Perl one liner :D ``` open(my $fh, '<', '/etc/passwd'); my @user_info = <$fh>; map { print $& . "\n" if $_ =~ /^d([^:]*)/ } @user_info; close $fh; ``` - Avoiding one-liners ``` foreach my $user_line (@user_info) { if ($user_line =~ /^d([^:]*)/) { print $& . "\n"; } } ```
### Perl Files Handle
Mention the different modes in File Handling
- Read only: `<` - Write mode. It creates the file if doesn't exist: `>` - Append mode. It creates the file if doesn't exist: `>>` - Read and write mode: `+<` - Read, clear and write mode. It creates the file if doesn't exist: `+>` - Read and append. It creates the file if doesn't exist: `+>>`
How to write into a file?
``` # We can use: # '>' Write (it clears a previous content if exists). # '>>' Append. open(my $fh, '>>', 'file_name.ext') or die "Error: file can't be opened"; print $fh "writing text...\n"; close($fh); ```
How can you read a file and print every line?
``` open(my $fh, '<', 'file_to_read.ext') or die "Error: file can't be opened"; my @file = <$fh>; foreach my $line (@file) { print $line; } ``` We can use the file handle without assigning it to an array: ``` open(my $fh, '<', 'file_to_read.ext') or die "Error: file can't be opened"; foreach my $line (<$fh>) { print $line; } ```
### Perl OOP
Does Perl have support for OOP?
From the official [docs](https://perldoc.perl.org/perlootut): "By default, Perl's built-in OO system is very minimal, leaving you to do most of the work."
What is the purpose of the bless function?
The function os the `bless` function is used to turning a plain data structure into an object.
How to create a Perl class? How can you call a method?
- Let's create the package: `Example.pm` ``` package Example; sub new { my $class = shift; my $self = {}; bless $self, $class; return $self; } sub is_working { print "Working!"; } 1; ``` - Now we can instance the `Example` class and call `is_working` method: ``` my $e = new Example(); $e->is_working(); # Output: Working! ```
Does Perl have inheritance? What is the `SUPER` keyword?
Yes, Perl supports inheritance. We can read about it in the official [docs](https://perldoc.perl.org/perlobj#Inheritance). We also can read about `SUPER` keyword that is used to call a method from the parent class. It gives an example about how we can apply inheritance.
Does Perl have polymorphism? What is method overriding?
Yes, it has polymorphism. In fact method overriding is a way to apply it in Perl. Method overriding in simple words appears when we have a class with a method that already exist in a parent class. Example: ``` package A; sub new { return bless {}, shift; }; sub printMethod { print "A\n"; }; package B; use parent -norequire, 'A'; sub new { return bless {}, shift; }; sub printMethod { print "B\n"; }; my $a = A->new(); my $b = B->new(); A->new()->printMethod(); B->new()->printMethod(); # Output: # A # B ```
How can you call a method of an inherited class?
``` # Class `A` with `printA` method. package A; sub new { return bless {}, shift; }; sub printA { print "A"; }; # Class `B` that extends or use the parent class `A`. package B; use parent -norequire, 'A'; sub new { return bless {}, shift; }; # Instance class `B` allows call the inherited method my $b = B->new(); $b->printA(); ```
### Perl Exception Handling
How can we evaluate and capture an exception in Perl?
From the official [eval docs](https://perldoc.perl.org/functions/eval): "`eval` in all its forms is used to execute a little Perl program, trapping any errors encountered so they don't crash the calling program.". e.g: ``` eval { die; }; if ($@) { print "Error. Details: $@"; } ``` If we execute this we get the next output: ``` Error. Details: Died at eval.pl line 2. ``` The `eval` (`try` in another programming languages) is trying to execute a code. This code fails (it's a die), and then the code continues into the `if` condition that evaluates `$@` error variable have something stored. This is like a `catch` in another programming languages. At this way we can handle errors.
### Perl OS
What is Perl Open3?
From the official [IPC::Open3 docs](https://perldoc.perl.org/IPC::Open3): "IPC::Open3 - open a process for reading, writing, and error handling using open3()". With `open3` we can have the full control of the STDIN, STDOUT, STDERR. It's usually used to execute commands.
Using Open3: Create a file with the size of 15MB and check it's created successfully
- Code: ``` use IPC::Open3; use Data::Dumper; sub execute_command { my @command_to_execute = @_; my ($stdin, $stdout, $stderr); eval { open3($stdin, $stdout, $stderr, @command_to_execute); }; if ($@) { print "Error. Details: $@"; } close($stdin); return $stdout; } my $file_name = 'perl_open3_test'; &execute_command('truncate', '-s', '15M', $file_name); my $result = &execute_command('stat', '-c', '%s', $file_name); print Dumper(<$result>); ``` - Result: ``` $ -> perl command.pl $VAR1 = '15728640 '; ```
### Perl Packages & Modules
What is a Perl package? And a module?
With a Perl package we are defining a namespace. A Perl module in one simple word can be defined as a `class`. When we create a `class` in Perl we use the `package` keyword. A module can be used with the `use` keyword.
What is the difference between .pl and .pm extensions?
There's no a real difference between a `.pm` and `.pl` extensions. Perl use `.pm` extensions just to difference it as a perl module (a class). `.pl` extensions are usually named for perl scripts without OOP classes.
Why a Perl class (or module) should return something at the end of the file? Check the example.
If we want to `use` a Perl module (`import` a class), this module should end in a value different than 0. This is necessary because if we try to import the class and it has a false value, we will not be able to use it. ``` package A; sub new { return bless {}, shift; }; sub printMethod { print "A\n"; }; 1; ```
What is cpan? And cpanm?
CPAN is the Comprehensive Perl Archive Network. CPANM From the official [App::cpanminus](https://metacpan.org/pod/App::cpanminus): "App::cpanminus - get, unpack, build and install modules from CPAN". [Find CPAN modules](https://metacpan.org/)
How can you install cpanm and a Perl module?
There are some different alternatives to install Perl modules. We will use `cpanm`. - Install `cpanm`: ``` $ cpan App::cpanminus ``` - Install the `Test` module with `cpanm`: ``` cpanm Test ``` Now we can test the `Test` installed module: ``` $ perl -M'Test::Simple tests => 1' -e 'ok( 1 + 1 == 2 );' 1..1 ok 1 ``` ``` $ perl -M'Test::Simple tests => 1' -e 'ok( 1 + 1 == 3 );' 1..1 not ok 1 # Failed test at -e line 1. # Looks like you failed 1 test of 1. ```
================================================ FILE: topics/pipeline_deploy_image_to_k8.md ================================================ ## Build & Publish Docker Images to Kubernetes Cluster Write a pipeline, on any CI/CD system you prefer, that will build am image out of a given Dockerfile and will publish that image to running Kubernetes cluster. ================================================ FILE: topics/programming/grep_berfore_and_after.md ================================================ Implement the following grep command in Python (numbers can be different): `grep error -A 2 -B 2 some_file` ================================================ FILE: topics/programming/web_scraper.md ================================================ ## Web Scraper 1. Pick a web site to scrape 2. Using any language you would like, write a web scraper to save some data from the site you chose 3. Save the results to a database (doesn't matter which database, just pick one) * Note: if you don't know which site to pick up have a look [here](http://toscrape.com) ================================================ FILE: topics/python/advanced_data_types.md ================================================ ## (Advanced) Identify the data type For each of the following, identify what is the data type of the result variable 1. a = {'a', 'b', 'c'} 2. b = {'1': '2'} 4. c = ([1, 2, 3]) 4. d = (1, 2, 3) 4. e = True+True ================================================ FILE: topics/python/class_0x00.md ================================================ ## Class write a simple class that has two attributes of which one has a default value and has two methods ================================================ FILE: topics/python/compress_string.md ================================================ ## Compress String 1. Write a function that gets a string and compresses it - 'aaaabbccc' -> 'a4b2c3' - 'abbbc' -> 'a1b3c1' 2. Write a function that decompresses a given string - 'a4b2c3' -> 'aaaabbccc' - 'a1b3c1' -> 'abbbc' ================================================ FILE: topics/python/data_types.md ================================================ ## Data Types For each of the following, identify what is the data type of the result variable 1. a = [1, 2, 3, 4, 5] 2. b = "Hello, is it me you looking for?" 3. e = 100 4. f = '100' 5. i = 0.100 6. i = True Bonus question: how to find out in Python what is the data type of certain variable? ================================================ FILE: topics/python/reverse_string.md ================================================ ## Reverse a String Write a code that reverses a string ================================================ FILE: topics/python/solutions/advanced_data_types_solution.md ================================================ ## (Advanced) Identify the data type For each of the following, identify what is the data type of the result variable 1. a = {'a', 'b', 'c'} -> set 2. b = {'1': '2'} -> dict 4. c = ([1, 2, 3]) -> list 4. d = (1, 2, 3) -> tuple 4. e = True+True -> int ================================================ FILE: topics/python/solutions/class_0x00_solution.md ================================================ ## Class 0x00 - Solution 1. write a simple class that has two attributes of which one has a default value and has two methods ```python from typing import Optional """ Student Module """ class Student: def __init__(self, name: str, department: Optional[str] = None) -> None: """ Instance Initialization function Args: name (str): Name of student department (Optional[str], optional): Department. Defaults to None. """ self.name = name self.department = department def getdetails(self) -> str: """ Gets the students details Returns: str: A formatted string """ return f"Name is {self.name}, I'm in department {self.department}" def change_department(self, new_deparment: str) -> None: """Changes the department of the student object Args: new_deparment (str): Assigns the new department value to dept attr """ self.department = new_deparment # student1 instantiation student1 = Student("Ayobami", "Statistics") print(student1.getdetails()) # Calling the change_department function to change the department of student student1.change_department("CS") print(student1.department) ``` Output ``` Name is Ayobami, I'm in department Statistics CS ``` ================================================ FILE: topics/python/solutions/compress_string_solution.md ================================================ ## Compress String Solution 1. Write a function that gets a string and compresses it - 'aaaabbccc' -> 'a4b2c3' - 'abbbc' -> 'a1b3c1' ``` def compress_str(mystr: str) -> str: result = '' if mystr: prevchar = mystr[0] else: return result count = 1 for nextchar in mystr[1:]: if nextchar == prevchar: count += 1 else: result += prevchar + str(count) count = 1 prevchar = nextchar result += prevchar + str(count) return result ``` 2. Write a function that decompresses a given string - 'a4b2c3' -> 'aaaabbccc' - 'a1b3c1' -> 'abbbc' ``` def decompress_str(mystr: str) -> str: result = '' for index in range(0, len(mystr), 2): result += mystr[index] * int(mystr[index + 1]) return result ``` ================================================ FILE: topics/python/solutions/data_types_solution.md ================================================ ## Data Types - Solution 1. a = [1, 2, 3, 4, 5] -> list 2. b = "Hello, is it me you looking for?" -> string 3. e = 100 -> int 4. f = '100' -> string 5. i = 0.100 -> float 6. i = True -> bool ### Bonus question - Answer `type(...)` ================================================ FILE: topics/python/solutions/reverse_string.md ================================================ ## Reverse a String - Solution ``` my_string[::-1] ``` A more visual way is:
Careful: this is very slow ``` def reverse_string(string): temp = "" for char in string: temp = char + temp return temp ``` ================================================ FILE: topics/python/solutions/sort_solution.md ================================================ ## Sort Descending - Solution 1. write a function that sorts the following list of list without using the `sorted()` and `.sort()` function in descending order - mat_list = [[1, 2, 3], [2, 4, 4], [5, 5, 5]] -> [[5, 5, 5], [2, 4, 4], [1, 2, 3]] ```python def sort_desc(mat: list) -> list: """ Sorts a list in descending order Args: mat (list): paresd list Returns: list: A new list """ new_list = [] while mat != []: maxx = max(mat) new_list.append(maxx) mat.remove(maxx) return new_list print(sort_func(mat_list)) ``` ================================================ FILE: topics/python/sort.md ================================================ ## Sort Descending 1. write a function that sorts the following list of list without using the `sorted()` and `.sort()` function in descending order - list = [[1, 2, 3], [2, 4, 4], [5, 5, 5]] -> [[5, 5, 5], [2, 4, 4], [1, 2, 3]] ================================================ FILE: topics/security/README.md ================================================ # Security
What is DevSecOps? What its core principals?
A couple of quotations from chosen companies: [Snyk](https://snyk.io/series/devsecops): "DevSecOps refers to the integration of security practices into a DevOps software delivery model. Its foundation is a culture where development and operations are enabled through process and tooling to take part in a shared responsibility for delivering secure software." [Red Hat](https://www.redhat.com/en/topics/devops/what-is-devsecops): "DevSecOps stands for development, security, and operations. It's an approach to culture, automation, and platform design that integrates security as a shared responsibility throughout the entire IT lifecycle." [Jfrog](https://jfrog.com/devops-tools/what-is-devsecops): "DevSecOps principles and practices parallel those of traditional DevOps with integrated and multidisciplinary teams, working together to enable secure continuous software delivery. The DevSecOps development lifecycle is a repetitive process that starts with a developer writing code, a build being triggered, the software package deployed to a production environment and monitored for issues identified in the runtime but includes security at each of these stages."
What the "Zero Trust" concept means? How Organizations deal with it?
[Codefresh definition](https://codefresh.io/security-testing/codefresh-runner-overview): "Zero trust is a security concept that is centered around the idea that organizations should never trust anyone or anything that does not originate from their domains. Organizations seeking zero trust automatically assume that any external services it commissions have security breaches and may leak sensitive information"
Explain the principle of least privilege
The principle of least privilege refers to the practice of providing minimal permissions to users, roles, and service accounts that allow them to perform their functions. If an entity does not require an access right then it should not have that right.
What it means to be "FIPS compliant"?
What is a Certificate Authority?
[wikipedia](https://en.wikipedia.org/wiki/Certificate_authority) : A certificate Authority that stores, singns and issues certificates. A certificate certifies the authenticity of the public key delivered by the website. It prevents [man-in-the-middle](https://en.wikipedia.org/wiki/Man-in-the-middle_attack) attacks by providing a lot of information which identifie the public key. Importante information provided inside a [X.509](https://www.ssl.com/faqs/what-is-an-x-509-certificate/) certificate are like : * Version Number * Serial Number * Signature Algorithm ID * Issuer Name * Validity period * Subject name * Subject Public Key info Every certificates must be signed by a trusted authority, a certificate chain is a concatenation of multiple certificates signed by a more trusted authority from the one delivered by the website to the root Certificate Authority (CA). The root Certificate Authority is the top most trusted authority and every browsers embark their certificate natively.
Explain RBAC (Role-based Access Control)
Access control based on user roles (i.e., a collection of access authorizations a user receives based on an explicit or implicit assumption of a given role). Role permissions may be inherited through a role hierarchy and typically reflect the permissions needed to perform defined functions within an organization. A given role may apply to a single individual or to several individuals. - RBAC mapped to job function, assumes that a person will take on different roles, overtime, within an organization and different responsibilities in relation to IT systems.
#### Security - Authentication and Authorization
Explain Authentication and Authorization
Authentication is the process of identifying whether a service or a person is who they claim to be. Authorization is the process of identifying what level of access the service or the person have (after authentication was done)
What authentication methods are there?
Give an example of basic authentication process
A user uses the browser to authenticate to some server. It does so by using the authorization field which is constructed from the username and the password combined with a single colon. The result string is encoded using a certain character set which is compatible with US-ASCII. The authorization method + a space is prepended to the encoded string.
What are the three primary factors of authentication? Give three examples of each
Something you have - Smart card - Physical authentication device - Software token Something you know - Password - PIN - Passphrase Something you are - Fingerprint - Iris or retina scan - Gait analysis
Explain Token-based authentication
Explain Risk-based authentication
Explain what is Single Sign-On
SSO (Single Sign-on), is a method of access control that enables a user to log in once and gain access to the resources of multiple software systems without being prompted to log in again.
Explain how the Kerberos authentication protocol works as a SSO solution
Kerberos works as a SSO solution by only requiring the user to sign in using their credentials once within a specific validity time window. Kerberos authentication grants the user a Ticket Granting Ticket (TGT) from a trusted authentication server which can then be used to request service tickets for accessing various services and resources. By passing around this encrypted TGT instead of credentials, the user does not need to sign-in multiple times for each resource that has been integrated with Kerberos.
Does Kerberos make use of symmetric encryption, asymmetric encryption, both, or neither?
Symmetric Encryption - Kerberos uses exclusively symmetric encryption with pre-shared keys for transmitting encrypted information and authorizing users.
Explain MFA (Multi-Factor Authentication)
Multi-Factor Authentication (Also known as 2FA). Allows the user to present two pieces of evidence, credentials, when logging into an account. - The credentials fall into any of these three categories: something you know (like a password or PIN), something you have (like a smart card), or something you are (like your fingerprint). Credentials must come from two different categories to enhance security.
Explain OAuth
#### Security - Passwords
How do you manage sensitive information (like passwords) in different tools and platforms?
What password attacks are you familiar with?
* Dictionary * Brute force * Password Spraying * Social Engineering * Whaling * Vishing * Phising * Whaling
How to mitigate password attacks?
* Strong password policy * Do not reuse passwords * ReCaptcha * Training personnel against Social Engineering * Risk Based Authentication * Rate limiting * MFA
What is password salting? What attack does it help to deter?
Password salting is the processing of prepending or appending a series of characters to a user's password before hashing this new combined value. This value should be different for every single user but the same salt should be applied to the same user password every time it is validated. This ensures that users that have the same password will still have very different hash values stored in the password database. This process specifically helps deter rainbow table attacks since a new rainbow table would need to be computed for every single user in the database.
#### Security - Cookies
What are cookies? Explain cookie-based authentication
True or False? Cookie-based authentication is stateful
True. Cookie-based authentication session must be kept on both server and client-side.
Explain the flow of using cookies
1. User enters credentials 2. The server verifies the credentials -> a sessions is created and stored in the database 3. A cookie with the session ID is set in the browser of that user 4. On every request, the session ID is verified against the database 5. The session is destroyed (both on client-side and server-side) when the user logs out
#### Security - SSH
What is SSH how does it work?
[Wikipedia Definition](https://en.wikipedia.org/wiki/SSH_(Secure_Shell)): "SSH or Secure Shell is a cryptographic network protocol for operating network services securely over an unsecured network." [Hostinger.com Definition](https://www.hostinger.com/tutorials/ssh-tutorial-how-does-ssh-work): "SSH, or Secure Shell, is a remote administration protocol that allows users to control and modify their remote servers over the Internet." [This site](https://www.hostinger.com/tutorials/ssh-tutorial-how-does-ssh-work) explains it in a good way.
What is the role of an SSH key?
[Wikipedia definition](https://en.wikipedia.org/wiki/Secure_Shell) : SSH uses public-key cryptography to authenticate the remote computer and allow it to authenticate the user. Two keys are created, private is stored inside user's computer to decrypt the communication then the public key is stored inside the remoted computer where user want to connect with and it is used to encrypt the communication.
#### Security - Cryptography
Explain Symmetrical encryption
A symmetric encryption is any technique where a key is used to both encrypt and decrypt the data/entire communication.
Explain Asymmetrical encryption
A asymmetric encryption is any technique where the there is two different keys that are used for encryption and decryption, these keys are known as public key and private key.
What is "Key Exchange" (or "key establishment") in cryptography?
[Wikipedia](https://en.wikipedia.org/wiki/Key_exchange): "Key exchange (also key establishment) is a method in cryptography by which cryptographic keys are exchanged between two parties, allowing use of a cryptographic algorithm."
True or False? The symmetrical encryption is making use of public and private keys where the private key is used to decrypt the data encrypted with a public key
False. This description fits the asymmetrical encryption.
True or False? The private key can be mathematically computed from a public key
False.
True or False? In the case of SSH, asymmetrical encryption is not used to the entire SSH session
True. It is only used during the key exchange algorithm of symmetric encryption.
What is Hashing?
Hashing is a mathematical function for mapping data of arbitrary sizes to fixed-size values. This function produces a "digest" of the data that can be used for verifying that the data has not been modified (amongst other uses)
How is hashing different from encryption?
Encrypted data can be decrypted to its original value. Hashed data cannot be reversed to view the original data - hashing is a one-way function.
How hashes are part of SSH?
Hashes used in SSH to verify the authenticity of messages and to verify that nothing tampered with the data received.
#### Security - Attacks, Threats, and Vulnerabilities
Explain the following: * Vulnerability * Exploits * Risk * Threat
Are you familiar with "OWASP top 10"?
Read about it [here](https://owasp.org/www-project-top-ten)
What is XSS?
Cross Site Scripting (XSS) is an type of a attack when the attacker inserts browser executable code within a HTTP response. Now the injected attack is not stored in the web application, it will only affect the users who open the maliciously crafted link or third-party web page. A successful attack allows the attacker to access any cookies, session tokens, or other sensitive information retained by the browser and used with that site  You can test by detecting user-defined variables and how to input them. This includes hidden or non-obvious inputs such as HTTP parameters, POST data, hidden form field values, and predefined radio or selection values. You then analyze each found vector to see if their are potential vulnerabilities, then when found you craft input data with each input vector. Then you test the crafted input and see if it works.
What is an SQL injection? How to manage it?
SQL injection is an attack consists of inserts either a partial or full SQL query through data input from the browser to the web application. When a successful SQL injection happens it will allow the attacker to read sensitive information stored on the database for the web application.  You can test by using a stored procedure, so the application must be sanitize the user input to get rid of the risk of code injection. If not then the user could enter bad SQL, that will then be executed within the procedure
What is Certification Authority?
How do you identify and manage vulnerabilities?
Explain "Privilege Restriction"
How HTTPS is different from HTTP?
The 'S' in HTTPS stands for 'secure'. HTTPS uses TLS to provide encryption of HTTP requests and responses, as well as providing verifaction by digitally signing requests and responses. As a result, HTTPS is far more secure than HTTP and is used by default for most modern websites.
What types of firewalls are there?
What is DDoS attack? How do you deal with it?
What is port scanning? When is it used?
What is the difference between asynchronous and synchronous encryption?
Explain Man-in-the-middle attack
Explain CVE and CVSS
[Red Hat](https://www.redhat.com/en/topics/security/what-is-cve#how-does-it-work) : "When someone refers to a CVE (Common Vulnerabilities and Exposures), they mean a security flaw that's been assigned a CVE ID number. They don’t include technical data, or information about risks, impacts, and fixes." So CVE is just identified by an ID written with 8 digits. The CVE ID have the following format: CVE prefix + Year + Arbitrary Digits. Anyone can submit a vulnerability, [Exploit Database](https://www.exploit-db.com/submit) explains how it works to submit. Then CVSS stands for Common Vulnerability Scoring System, it attempts to assign severity scores to vulnerabilities, allowing to ordonnance and prioritize responses and resources according to threat.
What is ARP Poisoning?
Describe how do you secure public repositories
What is DNS Spoofing? How to prevent it?
DNS spoofing occurs when a particular DNS server’s records of “spoofed” or altered maliciously to redirect traffic to the attacker. This redirection of traffic allows the attacker to spread malware, steal data, etc. **Prevention** - Use encrypted data transfer protocols - Using end-to-end encryption vian SSL/TLS will help decrease the chance that a website / its visitors are compromised by DNS spoofing. - Use DNSSEC - DNSSEC, or Domain Name System Security Extensions, uses digitally signed DNS records to help determine data authenticity. - Implement DNS spoofing detection mechanisms - it’s important to implement DNS spoofing detection software. Products such as XArp help product against ARP cache poisoning by inspecting the data that comes through before transmitting it.
What can you tell me about Stuxnet?
Stuxnet is a computer worm that was originally aimed at Iran’s nuclear facilities and has since mutated and spread to other industrial and energy-producing facilities. The original Stuxnet malware attack targeted the programmable logic controllers (PLCs) used to automate machine processes. It generated a flurry of media attention after it was discovered in 2010 because it was the first known virus to be capable of crippling hardware and because it appeared to have been created by the U.S. National Security Agency, the CIA, and Israeli intelligence.
What can you tell me about the BootHole vulnerability?
What can you tell me about Spectre?
Spectre is an attack method which allows a hacker to “read over the shoulder” of a program it does not have access to. Using code, the hacker forces the program to pull up its encryption key allowing full access to the program
Explain "Format String Vulnerability"
Explain DMZ
Explain TLS
What is CSRF? How to handle CSRF?
Cross-Site Request Forgery (CSRF) is an attack that makes the end user to initiate a unwanted action on the web application in which the user has a authenticated session, the attacker may user an email and force the end user to click on the link and that then execute malicious actions. When an CSRF attack is successful it will compromise the end user data  You can use OWASP ZAP to analyze a "request", and if it appears that there no protection against cross-site request forgery when the Security Level is set to 0 (the value of csrf-token is SecurityIsDisabled.) One can use data from this request to prepare a CSRF attack by using OWASP ZAP
Explain HTTP Header Injection vulnerability
HTTP Header Injection vulnerabilities occur when user input is insecurely included within server responses headers. If an attacker can inject newline characters into the header, then they can inject new HTTP headers and also, by injecting an empty line, break out of the headers into the message body and write arbitrary content into the application's response.
What security sources are you using to keep updated on latest news?
What TCP and UDP vulnerabilities are you familiar with?
Do using VLANs contribute to network security?
What are some examples of security architecture requirements?
What is air-gapped network (or air-gapped environment)? What its advantages and disadvantages?
Explain what is Buffer Overflow
A buffer overflow (or buffer overrun) occurs when the volume of data exceeds the storage capacity of the memory buffer. As a result, the program attempting to write the data to the buffer overwrites adjacent memory locations.
What is Nonce?
What is SSRF?
SSRF (Server-side request forgery) it's a vulnerability where you can make a server make arbitrary requests to anywhere you want. Read more about it at [portswigger.net](https://portswigger.net/web-security/ssrf)
Explain MAC flooding attack
MAC address flooding attack (CAM table flooding attack) is a type of network attack where an attacker connected to a switch port floods the switch interface with very large number of Ethernet frames with different fake source MAC address.
What is port flooding?
What is "Diffie-Hellman key exchange" and how does it work?
Have you heard of [The Two General's Problem](https://en.wikipedia.org/wiki/Two_Generals%27_Problem)? The Diffie-Hellman key exchange is a solution to this problem to allow for the secure exchange of cryptographic keys over an encrypted channel. It works using public/private key pairs (asymmetric encryption). Two parties that wish to communicate securely over a public channel will each generate a public/private key pair and distribute the public key to the other party (note that public keys are free to be exchanged over a public channel). From here, each party can derive a shared key using a combination of their personal private key and the public key of the other party. This combined key can now be used as a symmetric encryption key for communications.
Explain "Forward Secrecy"
What is Cache Poisoned Denial of Service?
CPDoS or Cache Poisoned Denial of Service. It poisons the CDN cache. By manipulating certain header requests, the attacker forces the origin server to return a Bad Request error which is stored in the CDN’s cache. Thus, every request that comes after the attack will get an error page.
What is the difference if any between SSL and TLS?
What's SSL termination (or SSL offloading)?
SSL termination is the process of decrypting encrypted traffic. The advantage in SSL termination is that the server doesn't have to perform it, we can use SSL termination to reduce the load on the server, speed up some processes, and allow the server to focus on its core functionality (e.g. deliver content)
What is SNI (Server Name Indication)?
[Wikipedia](https://en.wikipedia.org/wiki/Server_Name_Indication): "an extension to the Transport Layer Security (TLS) computer networking protocol by which a client indicates which hostname it is attempting to connect to at the start of the handshaking process"
What benefits SNI introduces?
SNI allows a single server to serve multiple certificates using the same IP and port.
Practically this means that a single IP can server multiple web services/pages, each using a different certificate.
Explain "Web Cache Deception Attach"
[This blog post](https://omergil.blogspot.com/2017/02/web-cache-deception-attack.html) explains it in detail.
#### Security - Threats
Explain "Advanced persistent threat (APT)"
What is a "Backdoor" in information security?
#### Software Supply Chain & Security
Briefly describe what a software supply chain is.
A company’s software supply chain consists of any third party or open source component which could be used to compromise the final product. Such component is usually an API provided by an actor. For instance Twilio who offers mobile communication APIs to their customers. [WhiteSource](https://www.whitesourcesoftware.com/resources/blog/software-supply-chain-security-the-basics-and-four-critical-best-practices/): "Enterprise software projects increasingly depend on third-party and open source components. These components are created and maintained by individuals who are not employed by the organization developing the primary software, and who do not necessarily use the same security policies as the organization. This poses a security risk, because differences or inconsistencies between these policies can create overlooked areas of vulnerability that attackers seek to exploit."
What're some benefits of a software supply chain?
[Increment](https://increment.com/apis/apis-supply-chain-software/): Resource-saving. Using and paying for existing solutions to resource-heavy problems saves time as well as money. Hence resulting in efficient, cheap and greater opportunities to develop and deploy software products for consumers.
Give three examples of three potential security threats related to the software supply chain and describe them.
[IEEE](https://ieeexplore.ieee.org/abstract/document/9203862): * Sensitive data being exposed or lost. * In a software supply chain, sensitive data may be passed throughout the chain. Security threats involve loss or exposure of this data, such as customer credit card details. * Cloud technology. * Data sharing in the cloud might jeopardize the privacy of the data within the chain. * Third-party vendors. * Third-party vendors’ code solutions might not provide sufficient cybersecurity and risk being a potential subject to data breaches.
#### Package management & Security
What is a package manager?
[Baudry et al.](https://arxiv.org/pdf/2001.07808.pdf): "A tool that allows you to easily download, add and thus reuse programming libraries in your project." E.g. npm or yarn.
What is a build tool?
[Baudry et al.](https://arxiv.org/pdf/2001.07808.pdf): "A tool that fetches the packages (dependencies) that are required to compile, test and deploy your application."
Describe bloated dependencies.
[Baudry et al.](https://arxiv.org/pdf/2001.07808.pdf): An application usually has different dependencies. Typically, not all of them are required for building and running the application. Bloated dependencies is the concept of including the unnecessary dependencies for building and running your application.
Explain a few cons of bloated dependencies.
[Baudry et al.](https://arxiv.org/pdf/2001.07808.pdf): * Challenging to manage. * Decreases performance of the application. * Risk for malicious code that a threathening actor can take advantage of.
What solutions are there for managing project dependencies?
[Npm.js documentation](https://docs.npmjs.com/cli/v8/commands/npm-prune): Use clean-up commands that are usually provided by the package manager authors. For instance, npm prune will remove any extraneous package. Another command is npm audit which will scan your repository and report any vulnerable dependencies found.
What is a threatening actor and how can this actor take advantage of open source or third party vendor's packages/libraries?
[Wikipedia](https://en.wikipedia.org/wiki/Threat_actor): A threatening actor is one or more people who target technical artifacts such as software, networks and/or devices with the purpose of harming it. [Aquasec](https://www.aquasec.com/cloud-native-academy/devsecops/supply-chain-security/): An attacking actor may identify, target and inject malicious software in a vulnerable part of an open source package or a third party vendor’s code. The consumer of this code may consequently and unknowingly deploy the malicious code throughout their pipelines, thus infecting their own projects. An example of this happening is the hack of [SolarWinds](https://www.npr.org/2021/04/16/985439655/a-worst-nightmare-cyberattack-the-untold-story-of-the-solarwinds-hack).
How can you make sure that you use trustworthy packages for your project?
You can’t. You will always be exposed to security risk once you start using open source or vendor packages. The goal is to minimize the risk in order to avoid security breaches. This could be done by: * Regularly update the project's dependencies to apply latest bug fixes and vulnerability clean-ups. * However, unless you trust the author, do not update your dependencies instantly, since package updates recently have been a common target by hackers. * Check for changes of the file content in previous versions.
Explain checksum.
[Fred Cohen (permission needed)](https://reader.elsevier.com/reader/sd/pii/0167404887900319?token=D5339ABC064AD9A2B50B74D8CE890B0E22A302A0BC461A50078D407BEA01052737DC6AAEF95A854E72A73B6D0C67E260&originRegion=eu-west-1&originCreation=20220502180611): Checksum is a way to verify the integrity of information in systems with no built-in protection. In other words, it provides a way of validating that the content of a file or a package / library is intact. This is useful since attacks or errors may occur during transmission of files. However, it requires that the package author has run a checksum function for the file / package which creates a specific hash for that version of the file. A minor change of the file content will result in a different checksum. If you have access to the original checksum of the file, you may run checksum on your own. In case the resulting checksum matches the original one, no changes have been made in the file. You can now conclude that no error or malicious injection was done during transmission of the file.
## Microsegmentation
What is Microsegmentation?
- Security method - Managing network access between endpoints (processes, devices, instances) - A method in which security policies are applied to limit traffic - based on concepts such as "Zero Trust" and "Least Privileged" - The result of Microsegmentation should be: - Reduced attack ability - Better breach containment
Why do we need Microsegmentation solutions? Why using something such as firewalls isn't enough?
- Firewalls focused on north-south traffic. Basically traffic that is outside of the company perimeter - Traffic that is considered west-east, internal workflows and communication, is usually left untreated
How Microsegmentation is applied?
There are different ways to apply Microsegmentation: - Cloud Native: Using cloud embedded capabilities such as security groups, firewalls, etc. - Agent: Agents running on the different endpoints (instances, services, etc.) - Network: Modify network devices and their configuration to create microsegmentation
What are ephemeral environments in the context of Microsegmentation?
- These are short-lived resources like containers or serverless functions that start and stop quickly. - Because they don’t last long, they need security rules that can change just as fast. - Microsegmentation helps by giving each one exactly the network access it needs — nothing more.
How does Microsegmentation help prevent lateral movement?
- It sets tight rules for how services or systems can talk to each other. - If one system gets hacked, the attacker can’t easily move to others. - By dividing systems into smaller zones, it makes the whole network harder to break into.
What challenges arise when scaling Microsegmentation?
- As more systems get added, managing all the rules becomes harder. - It’s tough to keep security rules consistent when everything’s changing all the time. - You also have to be careful not to slow things down while keeping everything secure.
================================================ FILE: topics/shell/README.md ================================================ ## Shell Scripting ### Shell Scripting Exercises |Name|Topic|Objective & Instructions|Solution|Comments| |--------|--------|------|----|----| |Hello World|Variables|[Exercise](hello_world.md)|[Solution](solutions/hello_world.md) | Basic |Basic date|Variables|[Exercise](basic_date.md)|[Solution](solutions/basic_date.md) | Basic |Great Day|Variables|[Exercise](great_day.md)|[Solution](solutions/great_day.md) | Basic |Factors|Arithmetic|[Exercise](factors.md)|[Solution](solutions/factors.md) | Basic |Argument Check|Conditionals|[Exercise](argument_check.md)|[Solution](solutions/argument_check.md) | Basic |Files Size|For Loops|[Exercise](files_size.md)|[Solution](solutions/files_size.md) | Basic |Count Chars|Input + While Loops|[Exercise](count_chars.md)|[Solution](solutions/count_chars.md) | Basic |Sum|Functions|[Exercise](sum.md)|[Solution](solutions/sum.md) | Basic |Number of Arguments|Case Statement|[Exercise](num_of_args.md)|[Solution](solutions/num_of_args.md) | Basic |Empty Files|Misc|[Exercise](empty_files.md)|[Solution](solutions/empty_files.md) | Basic |Directories Comparison|Misc|[Exercise](directories_comparison.md)|[Solution](solutions/directories_comparison.md) | Basic |It's alive!|Misc|[Exercise](host_status.md)|[Solution](solutions/host_status.md) | Intermediate ## Shell Scripting - Self Assessment
What does this line in shell scripts means?: #!/bin/bash
`#!/bin/bash` is She-bang /bin/bash is the most common shell used as default shell for user login of the linux system. The shell’s name is an acronym for Bourne-again shell. Bash can execute the vast majority of scripts and thus is widely used because it has more features, is well developed and better syntax.
True or False? When a certain command/line fails in a shell script, the shell script, by default, will exit and stop running
Depends on the language and settings used. If the script is a bash script then this statement is true. When a script written in Bash fails to run a certain command it will keep running and will execute all other commands mentioned after the command which failed. Most of the time we might actually want the opposite to happen. In order to make Bash exist when a specific command fails, use 'set -e' in your script.
What do you tend to include in every script you write?
Few example: * Comments on how to run it and/or what it does * If a shell script, adding "set -e" since I want the script to exit if a certain command failed You can have an entirely different answer. It's based only on your experience and preferences.
Today we have tools and technologies like Ansible, Puppet, Chef, ... Why would someone still use shell scripting?
* Speed * Flexibility * The module we need doesn't exist (perhaps a weak point because most CM technologies allow to use what is known as "shell" module) * We are delivering the scripts to customers who don't have access to the public network and don't necessarily have Ansible installed on their systems.
#### Shell Scripting - Variables
How to define a variable with the value "Hello World"?
`HW="Hello World`
How to define a variable with the value of the current date?
`DATE=$(date)`
How to print the first argument passed to a script?
`echo $1`
Write a script to print "yay" unless an argument was passed and then print that argument
``` echo "${1:-yay}" ```
What would be the output of the following script? ``` #!/usr/bin/env bash NINJA_TURTLE=Donatello function the_best_ninja_turtle { local NINJA_TURTLE=Michelangelo echo $NINJA_TURTLE } NINJA_TURTLE=Raphael the_best_ninja_turtle ```
Michelangelo
Explain what would be the result of each command: * echo $0 * echo $? * echo $$ * echo $#
What is $@?
What is difference between $@ and $*?
`$@` is an array of all the arguments passed to the script `$*` is a single string of all the arguments passed to the script
How do you get input from the user in shell scripts?
Using the keyword read so for example read x will wait for user input and will store it in the variable x.
How to compare variables length?
``` if [ ${#1} -ne ${#2} ]; then ... ```
#### Shell Scripting - Conditionals
Explain conditionals and demonstrate how to use them
In shell scripting, how to negate a conditional?
In shell scripting, how to check if a given argument is a number?
``` regex='^[0-9]+$' if [[ ${var//*.} =~ $regex ]]; then ... ```
#### Shell Scripting - Arithmetic Operations
How to perform arithmetic operations on numbers?
One way: `$(( 1 + 2 ))` Another way: `expr 1 + 2`
How to perform arithmetic operations on numbers?
How to check if a given number has 4 as a factor?
`if [ $(($1 % 4)) -eq 0 ]; then`
#### Shell Scripting - Loops
What is a loop? What types of loops are you familiar with?
Demonstrate how to use loops
#### Shell Scripting - Troubleshooting
How do you debug shell scripts?
Answer depends on the language you are using for writing your scripts. If Bash is used for example then: * Adding -x to the script I'm running in Bash * Old good way of adding echo statements If Python, then using pdb is very useful.
Running the following bash script, we don't get 2 as a result, why? ``` x = 2 echo $x ```
Should be `x=2`
#### Shell Scripting - Substring
How to extract everything after the last dot in a string?
`${var//*.}`
How to extract everything before the last dot in a string?
${var%.*}
#### Shell Scripting - Misc
Generate 8 digit random number
shuf -i 9999999-99999999 -n 1
Can you give an example to some Bash best practices?
What is the ternary operator? How do you use it in bash?
A short way of using if/else. An example: [[ $a = 1 ]] && b="yes, equal" || b="nope"
What does the following code do and when would you use it? diff <(ls /tmp) <(ls /var/tmp)
It is called 'process substitution'. It provides a way to pass the output of a command to another command when using a pipe | is not possible. It can be used when a command does not support STDIN or you need the output of multiple commands. https://superuser.com/a/1060002/167769
What are you using for testing shell scripts?
bats
================================================ FILE: topics/shell/argument_check.md ================================================ ## Argument Check ### Objectives Note: assume the script is executed with an argument 1. Write a script that will check if a given argument is the string "pizza" 1. If it's the string "pizza" print "with pineapple?" 2. If it's not the string "pizza" print "I want pizza!" ================================================ FILE: topics/shell/basic_date.md ================================================ ## Basic Date ### Objectives 1. Write a script that will put the current date in a file called "the_date.txt" ================================================ FILE: topics/shell/count_chars.md ================================================ ## Count Chars ### Objectives 1. Read input from the user until you get empty string 2. For each of the lines you read, count the number of characters and print it ### Constraints 1. You must use a while loop 2. Assume at least three lines of input ================================================ FILE: topics/shell/directories_comparison.md ================================================ ## Directories Comparison ### Objectives 1. You are given two directories as arguments and the output should be any difference between the two directories ================================================ FILE: topics/shell/empty_files.md ================================================ ## Empty Files ### Objectives 1. Write a script to remove all the empty files in a given directory (including nested directories) ================================================ FILE: topics/shell/factors.md ================================================ ## Shell Scripting - Factors ### Objectives Write a script that when given a number, will: * Check if the number has 2 as factor, if yes it will print "one factor" * Check if the number has 3 as factor, if yes it will print "one factor...actually two!" * If none of them (2 and 3) is a factor, print the number itself ================================================ FILE: topics/shell/files_size.md ================================================ ## Files Size ### Objectives 1. Print the name and size of every file and directory in current path Note: use at least one for loop! ================================================ FILE: topics/shell/great_day.md ================================================ ## Great Day ### Objectives 1. Write a script that will print "Today is a great day!" unless it's given a day name and then it should print "Today is " Note: no need to check whether the given argument is actually a valid day ================================================ FILE: topics/shell/hello_world.md ================================================ ## Shell Scripting - Hello World ### Objectives 1. Define a variable with the string 'Hello World' 2. Print the value of the variable you've defined and redirect the output to the file "amazing_output.txt" ================================================ FILE: topics/shell/host_status.md ================================================ ## It's Alive! ### Objectives 1. Write a script to determine whether a given host is down or up ================================================ FILE: topics/shell/num_of_args.md ================================================ ## Number of Arguments ### Objectives * Write a script that will print "Got it: " in case of one argument * In case no arguments were provided, it will print "Usage: ./ " * In case of more than one argument, print "hey hey...too many!" ================================================ FILE: topics/shell/print_arguments.md ================================================ ## Shell Scripting - Print Arguments ### Objectives You should include everything mentioned here in one shell script 1. Print the first argument passed to the script 2. Print the number of arguments passed to the script 3. ================================================ FILE: topics/shell/solutions/argument_check.md ================================================ ## Argument Check ### Objectives Note: assume the script is executed with an argument 1. Write a script that will check if a given argument is the string "pizza" 2. If it's the string "pizza" print "with pineapple?" 3. If it's not the string "pizza" print "I want pizza!" ### Solution ``` #!/usr/bin/env bash [[ ${1} == "pizza" ]] && echo "with pineapple?" || echo "I want pizza!" ``` ================================================ FILE: topics/shell/solutions/basic_date.md ================================================ ## Basic Date ### Objectives 1. Write a script that will put the current date in a file called "the_date.txt" ### Solution ``` #!/usr/bin/env bash echo $(date) > the_date.txt ``` ================================================ FILE: topics/shell/solutions/count_chars.md ================================================ ## Count Chars ### Objectives 1. Read input from the user until you get empty string 2. For each of the lines you read, count the number of characters and print it ### Constraints 1. You must use a while loop 2. Assume at least three lines of input ### Solution ``` #!/usr/bin/env bash echo -n "Please insert your input: " while read line; do echo -n "$line" | wc -c echo -n "Please insert your input: " done ``` ================================================ FILE: topics/shell/solutions/directories_comparison.md ================================================ ## Directories Comparison ### Objectives 1. You are given two directories as arguments and the output should be any difference between the two directories ### Solution 1 Suppose the name of the bash script is ```dirdiff.sh``` ``` #!/bin/bash if test $# -ne 2 then echo -e "USAGE: ./dirdiff.sh directory1 directory2" exit 1 fi # check for the checksums. # If both the checksums same, then both directories are same if test `ls -1 $1 | sort | md5sum | awk -F " " '{print $1}'` == `ls -1 $2 | sort | md5sum | awk -F " " '{print $1}'` then echo -e "No difference between the 2 directories" exit 0 fi diff -q $1 $2 ``` ### Solution 2 With gnu find, you can use diff to compare directories recursively. ```shell diff --recursive directory1 directory2 ``` ================================================ FILE: topics/shell/solutions/empty_files.md ================================================ ## Empty Files ### Objectives 1. Write a script to remove all the empty files in a given directory (including nested directories) ### Solution ``` #! /bin/bash for x in * do if [ -s $x ] then continue else rm -rf $x fi done ``` ================================================ FILE: topics/shell/solutions/factors.md ================================================ ## Shell Scripting - Factors ### Objectives Write a script that when given a number, will: * Check if the number has 2 as factor, if yes it will print "one factor" * Check if the number has 3 as factor, if yes it will print "one factor...actually two!" * If none of them (2 and 3) is a factor, print the number itself ### Solution ``` #!/usr/bin/env bash (( $1 % 2 )) || res="one factor" (( $1 % 3 )) || res+="...actually two!" echo ${res:-$1} ``` ================================================ FILE: topics/shell/solutions/files_size.md ================================================ ## Files Size ### Objectives 1. Print the name and size of every file and directory in current path Note: use at least one for loop! ### Solution ``` #!/usr/bin/env bash for i in $(ls -S1); do echo $i: $(du -sh "$i" | cut -f1) done ``` ================================================ FILE: topics/shell/solutions/great_day.md ================================================ ## Great Day ### Objectives 1. Write a script that will print "Today is a great day!" unless it's given a day name and then it should print "Today is " Note: no need to check whether the given argument is actually a valid day ### Solution ``` #!/usr/bin/env bash echo "Today is ${1:-a great day!}" ``` ================================================ FILE: topics/shell/solutions/hello_world.md ================================================ ## Shell Scripting - Hello World ### Objectives 1. Define a variable with the string 'Hello World' 2. Print the value of the variable you've defined and redirect the output to the file "amazing_output.txt" ### Solution ``` #!/usr/bin/env bash HW_STR="Hello World" echo $HW_STR > amazing_output.txt ``` ================================================ FILE: topics/shell/solutions/host_status.md ================================================ ## It's Alive! ### Objectives 1. Write a script to determine whether a given host is down or up ### Solution ``` #!/usr/bin/env bash SERVERIP= NOTIFYEMAIL=test@example.com ping -c 3 $SERVERIP > /dev/null 2>&1 if [ $? -ne 0 ] then # Use mailer here: mailx -s "Server $SERVERIP is down" -t "$NOTIFYEMAIL" < /dev/null fi ``` ================================================ FILE: topics/shell/solutions/num_of_args.md ================================================ ## Number of Arguments ### Objectives * Write a script that will print "Got it: " in case of one argument * In case no arguments were provided, it will print "Usage: ./ " * In case of more than one argument, print "hey hey...too many!" ### Solution ``` #!/usr/bin/env bash set -eu main() { case $# in 0) printf "%s" "Usage: ./ "; return 1 ;; 1) printf "%s" "Got it: $1"; return 0 ;; *) return 1 ;; esac } main "$@" ``` ================================================ FILE: topics/shell/solutions/sum.md ================================================ ## Sum ### Objectives 1. Write a script that gets two numbers and prints their sum 3. Make sure the input is valid (= you got two numbers from the user) 2. Test the script by running and passing it two numbers as arguments ### Constraints 1. Use functions ### Solution ``` #!/usr/bin/env bash re='^[0-9]+$' if ! [[ $1 =~ $re && $2 =~ $re ]]; then echo "Oh no...I need two numbers" exit 2 fi function sum { echo $(( $1 + $2 )) } sum $1 $2 ``` ================================================ FILE: topics/shell/sum.md ================================================ ## Sum ### Objectives 1. Write a script that gets two numbers and prints their sum 3. Make sure the input is valid (= you got two numbers from the user) 2. Test the script by running and passing it two numbers as arguments ### Constraints 1. Use functions ================================================ FILE: topics/soft_skills/README.md ================================================ ## HR & Soft Skills These are not DevOps related questions as you probably noticed, but since they are part of the DevOps interview process I've decided it might be good to keep them
There are no answers for these questions for obvious reasons :)
Tell us little bit about yourself
Tell me about the best type of environment you've worked in (team, solo, pairs, ...)
Tell me about your last big project/task you worked on
What was most challenging part in the project you worked on?
How did you hear about us?
How would you describe a good leadership?
Describe yourself in one word
Tell me about a time where you didn't agree on an implementation
How do you deal with a situation where key stakeholders are not around and a big decision needs to be made?
Where do you see yourself 5 years down the line?
Give an example of a time when you were able to change the view of a team about a particular tool/project/technology
Have you ever caused a service outage? (or broke a working project, tool, ...?)
Rank the following in order 1 to 5, where 1 is most important: salaray, benefits, career, team/people, work life balance
You have three important tasks scheduled for today. One is for your boss, second for a colleague who is also a friend, third is for a customer. All tasks are equally important. What do you do first?
You have a colleague you don‘t get along with. Tell us some strategies how you create a good work relationship with them anyway.
What do you love about your work?
What are your responsibilities in your current position?
Why should we hire you for the role?
#### Pointless Questions
Why do you want to work here?
Why are you looking to leave your current place?
What are your strengths and weaknesses?
Where do you see yourself in five years?
When you faced with problem, what do you do?
When was the last time you had to learn a new technology and what was your approach in doing so?
#### Team Lead
How would you improve productivity in your team?
================================================ FILE: topics/software_development/README.md ================================================ ## Software Development ### Agile Software Development
What is Agile in regards to software development?
[Atlassian](https://www.atlassian.com/agile/kanban/kanban-vs-scrum): "is a structured and iterative approach to project management and product development. It recognizes the volatility of product development, and provides a methodology for self-organizing teams to respond to change without going off the rails."
What is Kanban in regards to software development?
* Kanban is an agile software development framework * It focuses on having flexible and fluid process - no deadlines, fewer meetings, less formal roles * While arguable, Kanban seems to fit better small teams rather than big teams who might benefit more from structurized process
What is Scrum in regards to software development?
* Scrum is an agile software development framework * Fixed length iterations * Requires the team to have roles like scrum master and product owner
Can you compare between Kanban and Scrum?
* Kanban is continuous, fluid and visualized process whereas Scrum is short and structured, where work is shipped during fixed intervals known as sprints * Kanban is less structured compared to other frameworks like scrum * Kanban is more visualized way of managing the development process * Kanban has fewer meetings and formal roles compared to other frameworks like scrum
### Programming
What programming language do you prefer to use for DevOps related tasks? Why specifically this one?
For example, Python. It's multipurpose, easy-to-learn, continuously-evolving, and open-source. And it's very popular today
What are static typed (or simply typed) languages?
In static typed languages the variable type is known at compile-time instead of at run-time. Such languages are: C, C++ and Java
Explain expressions and statements
An expression is anything that results in a value (even if the value is None). Basically, any sequence of literals so, you can say that a string, integer, list, ... are all expressions. Statements are instructions executed by the interpreter like variable assignments, for loops and conditionals (if-else).
What is Object Oriented Programming? Why is it important?
[educative.io](https://www.educative.io/blog/object-oriented-programming) "Object-Oriented Programming (OOP) is a programming paradigm in computer science that relies on the concept of classes and objects. It is used to structure a software program into simple, reusable pieces of code blueprints (usually called classes), which are used to create individual instances of objects." OOP is the mainstream paradigm today. Most of the big services are wrote with OOP
Explain Composition
Composition - ability to build a complex object from other objects
What is a compiler and interpreter?
[bzfar.org](https://www.bzfar.org/publ/algorithms_programming/programming_languages/translators_compiler_vs_interpetator/42-1-0-50) Compiler: "A compiler is a translator used to convert high-level programming language to low-level programming language. It converts the whole program in one session and reports errors detected after the conversion. Compiler takes time to do its work as it translates high-level code to lower-level code all at once and then saves it to memory." Interpreter: "Just like a compiler, is a translator used to convert high-level programming language to low-level programming language. It converts the program line by line and reports errors detected at once, while doing the conversion. With this, it is easier to detect errors than in a compiler."
Are you familiar with SOLID design principles?
SOLID design principles are about: * Make it easier to extend the functionality of the system * Make the code more readable and easier to maintain SOLID is: * Single Responsibility - A class* should have one ~responsibility~ reason to change. It was edited by Robert Martin due to wrong understanding of principle * Open-Closed - A class should be open for extension, but closed for modification. What this practically means is that you should extend functionality by adding a new code and not by modifying it. Your system should be separated into components so it can be easily extended without breaking everything * Liskov Substitution - Any derived class should be able to substitute the its parent without altering its corrections. Practically, every part of the code will get the expected result no matter which part is using it * Interface Segregation - A client should never depend on anything it doesn't uses. Big interfaces must be split to smaller interfaces if needed * Dependency Inversion - High level modules should depend on abstractions, not low level modules *there also can be module, component, entity, etc. Depends on project structure and programming language
What is YAGNI? What is your opinion on it?
YAGNI - You aren't gonna need it. You must add functionality that will be used. No need to add functionality that is not directly needed
What is DRY? What is your opinion on it?
DRY - Don't repeat yourself. Actually it means that you shouldn't duplicate logic and use functions/classes instead. But this must be done smartly and pay attention to the domain logic. Same code lines don't always mean duplication
What are the four pillars of object oriented programming?
* Abstraction - you don't need to know how this class implemented. You need to know what functionality does it provide (interface) and how to use it * Encapsulation - keep fields for class purposes private (or protected) and provide public methods if needed. We must keep the data and code safe within the class itself * Inheritance - gives ability to create class that shares some of attributes of existing classes * Polymorphism - same methods in different contexts can do different things. Method overloading and overriding are some forms of polymorphism
Explain recursion
Recursion - process (or strategy), when function calls itself. It has recursive case and exit case. In recursive case we call function again, in exit case we finish function without calling it again. If we don't have exit case - function will work infinite, until memory overload or call stack limit
Explain Inversion of Control (IoC)
Inversion of Control - design principle, used to achieve loose coupling. You must use some abstraction layer to access some functionality (similar to SOLID Dependency Inversion)
Explain Dependency Injection (DI)
Dependency Injection - design pattern, used with IoC. Our object fields (dependencies) must be configurated by external objects
True or False? In Dynamically typed languages the variable type is known at run-time instead of at compile-time
True
Explain what are design patterns
[refactoring.guru](https://refactoring.guru/): "Design patterns are typical solutions to commonly occurring problems in software design. They are like pre-made blueprints that you can customize to solve a recurring design problem in your code."
Explain big O notation
[habr.com](https://habr.com/ru/post/559518/) "We can use Big O notation to compare and search different solutions to find which solution is best. The best solution is one that consumes less amount of time and space. Generally, time and space are two parameters that determine the efficiency of the algorithm. Big O Notation tells accurately how long an algorithm takes to run. It is a basic analysis of algorithm efficiency. It describes the execution time required. It depends on the size of input data that essentially passes in. Big O notation gives us algorithm complexity in terms of input size. For the large size of input data, the execution time will be slow as compared to the small size of input data. Big O notation is used to analyze space and time."
What is "Duck Typing"?
"When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck." This is direction in programming, where we are checking properties of object, but not it's type
Explain string interpolation
String interpolation - process of evaluating of string literal. For example (JS): ```js const messages = 5; console.log(`You have ${messages} new messages`); // You have 5 new messages ```
##### Common algorithms
Binary search: * How does it works? * Can you implement it? (in any language you prefer) * What is the average performance of the algorithm you wrote?
It's a search algorithm used with sorted arrays/lists to find a target value by dividing the array each iteration and comparing the middle value to the target value. If the middle value is smaller than target value, then the target value is searched in the right part of the divided array, else in the left side. This continues until the value is found (or the array divided max times) [python implementation](coding/python/binary_search.py) The average performance of the above algorithm is O(log n). Best performance can be O(1) and worst O(log n).
##### Code Review
What are your code-review best practices?
Do you agree/disagree with each of the following statements and why?: * The commit message is not important. When reviewing a change/patch one should focus on the actual change * You shouldn't test your code before submitting it. This is what CI/CD exists for.
#### Strings
In any language you want, write a function to determine if a given string is a palindrome
In any language you want, write a function to determine if two strings are Anagrams
#### Integers
In any language you would like, print the numbers from 1 to a given integer. For example for input: 5, the output is: 12345
#### Time Complexity
Describe what would be the time complexity of the operations access, search insert and remove for the following data structures:
* Stack * Queue * Linked List * Binary Search Tree
What is the complexity for the best, worst and average cases of each of the following algorithms?: * Quick sort * Merge sort * Bucket Sort * Radix Sort
#### Data Structures & Types
Implement Stack in any language you would like
Tell me everything you know about Linked Lists
* A linked list is a data structure * It consists of a collection of nodes. Together these nodes represent a sequence * Useful for use cases where you need to insert or remove an element from any position of the linked list * Some programming languages don't have linked lists as a built-in data type (like Python for example) but it can be easily implemented
Describe (no need to implement) how to detect a loop in a Linked List
There are multiple ways to detect a loop in a linked list. I'll mention three here: Worst solution:
Two pointers where one points to the head and one points to the last node. Each time you advance the last pointer by one and check whether the distance between head pointer to the moved pointer is bigger than the last time you measured the same distance (if not, you have a loop).
The reason it's probably the worst solution, is because time complexity here is O(n^2) Decent solution:
Create an hash table and start traversing the linked list. Every time you move, check whether the node you moved to is in the hash table. If it isn't, insert it to the hash table. If you do find at any point the node in the hash table, it means you have a loop. When you reach None/Null, it's the end and you can return "no loop" value. This one is very easy to implement (just create a hash table, update it and check whether the node is in the hash table every time you move to the next node) but since the auxiliary space is O(n) because you create a hash table then, it's not the best solution Good solution:
Instead of creating a hash table to document which nodes in the linked list you have visited, as in the previous solution, you can modify the Linked List (or the Node to be precise) to have a "visited" attribute. Every time you visit a node, you set "visited" to True.
Time compleixty is O(n) and Auxiliary space is O(1), so it's a good solution but the only problem, is that you have to modify the Linked List. Best solution:
You set two pointers to traverse the linked list from the beginning. You move one pointer by one each time and the other pointer by two. If at any point they meet, you have a loop. This solution is also called "Floyd's Cycle-Finding"
Time complexity is O(n) and auxiliary space is O(1). Perfect :)
Implement Hash table in any language you would like
What is Integer Overflow? How is it handled?
Name 3 design patterns. Do you know how to implement (= provide an example) these design pattern in any language you'll choose?
Given an array/list of integers, find 3 integers which are adding up to 0 (in any language you would like)
``` def find_triplets_sum_to_zero(li): li = sorted(li) for i, val in enumerate(li): low, up = 0, len(li)-1 while low < i < up: tmp = var + li[low] + li[up] if tmp > 0: up -= 1 elif tmp < 0: low += 1 else: yield li[low], val, li[up] low += 1 up -= 1 ```
================================================ FILE: topics/sql/improve_query.md ================================================ ## Comparisons vs. Functions 1. Improve the following query ``` SELECT COUNT(purchased_at) FROM shawarma_purchases WHERE purchased_at BETWEEN '2017-01-01' AND '2017-12-31'; ``` ================================================ FILE: topics/sql/solutions/improve_query.md ================================================ ## Comparisons vs. Functions - Solution ``` SELECT count(*) FROM shawarma_purchases WHERE purchased_at >= '2017-01-01' AND purchased_at <= '2017-31-12' ``` ================================================ FILE: topics/sre/README.md ================================================ # Site Reliability Engineering ## SRE Questions
What is an SLI (Service-Level Indicator)? An SLI is a measurement used to assess the actual performance or reliability of a service. It serves as the basis for defining SLOs. Examples: - Request latency - Processing throughput - Request failures per unit of time Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)

What is an SLO (Service-Level Objective)? An SLO is a target value or range of values for a service level that is measured by an SLI Example: 99% across 30 days for a specific collection of SLIs. It's also worthy to note that the SLO also serves as a lower bound, indicating that there is no requirement to be more reliable than necessary because doing so can delay the rollout of new features. Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)

What is an SLA (Service-Level Agreement)? AN SLA is a formal agreement between a service provider and customers, specifying the expected service quality and consequences for not meeting it. SRE doesn't typically get involved in constructing SLAs, because SLAs are closely tied to business and product decisions Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)

What is an Error Budget? An Error Budget represents the acceptable amount of downtime or errors a service can experience while still meeting its SLO. An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget. If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period. The error budget is a mechanism for balancing innovation and stability. If the SRE cannot enforce the error budget, the whole system breaks down. Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)

What is Toil? Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. If you can be automate a task, you should probably automate the task. Automation significantly reduces Toil. Investing in automation results in valuable work with lasting impact, offering scalability potential with minimal adjustments as your system expands. Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)
================================================ FILE: topics/terraform/README.md ================================================ # Terraform - [Terraform](#terraform) - [Exercises](#exercises) - [Terraform 101](#terraform-101) - [AWS](#aws) - [Questions](#questions) - [Terraform 101](#terraform-101-1) - [Terraform Hands-On Basics](#terraform-hands-on-basics) - [Dependencies](#dependencies) - [Providers](#providers) - [Variables](#variables) - [Input Variables](#input-variables) - [Output Variables](#output-variables) - [Locals](#locals) - [Variables Hands-On](#variables-hands-on) - [Data Sources](#data-sources) - [Lifecycle](#lifecycle) - [Provisioners](#provisioners) - [State](#state) - [Terraform Backend](#terraform-backend) - [Workspaces](#workspaces) - [State Hands-On](#state-hands-on) - [Terraform Structures and Syntax](#terraform-structures-and-syntax) - [Lists](#lists) - [Loops](#loops) - [Maps](#maps) - [Conditionals](#conditionals) - [Misc](#misc) - [Modules](#modules) - [Modules Hands-On](#modules-hands-on) - [Interview Cheatsheet](#interview-cheatsheet) - [Import](#import) - [Version Control](#version-control) - [AWS](#aws-1) - [Validations](#validations) - [Secrets](#secrets) - [Production](#production) ## Exercises ### Terraform 101 | Name | Topic | Objective & Instructions | Solution | Comments | | -------------- | ------ | ---------------------------------------------------------- | ---------------------------------------------------------- | -------- | | Local Provider | Basics | [Exercise](exercises/terraform_local_provider/exercise.md) | [Solution](exercises/terraform_local_provider/solution.md) | | ### AWS The following exercises require account in AWS and might cost you $$$ | Name | Topic | Objective & Instructions | Solution | Comments | | ----------------------------- | ----- | ----------------------------------------------------- | ----------------------------------------------------- | -------- | | Launch EC2 instance | EC2 | [Exercise](exercises/launch_ec2_instance/exercise.md) | [Solution](exercises/launch_ec2_instance/solution.md) | | | Rename S3 bucket | S3 | [Exercise](exercises/s3_bucket_rename/exercise.md) | [Solution](exercises/s3_bucket_rename/solution.md) | | | Create Custom VPC and Subnets | VPC | [Exercise](exercises/vpc_subnet_creation/exercise.md) | [Solution](exercises/vpc_subnet_creation/solution.md) | | ## Questions ### Terraform 101
What is Terraform?
[Terraform](https://www.terraform.io/intro): "HashiCorp Terraform is an infrastructure as code tool that lets you define both cloud and on-prem resources in human-readable configuration files that you can version, reuse, and share. You can then use a consistent workflow to provision and manage all of your infrastructure throughout its lifecycle. Terraform can manage low-level components like compute, storage, and networking resources, as well as high-level components like DNS entries and SaaS features."
What are the advantages in using Terraform or IaC in general?
- Full automation: In the past, resource creation, modification and removal were handled manually or by using a set of tooling. With Terraform or other IaC technologies, you manage the full lifecycle in an automated fashion.
- Modular and Reusable: Code that you write for certain purposes can be used and assembled in different ways. You can write code to create resources on a public cloud and it can be shared with other teams who can also use it in their account on the same (or different) cloud>
- Improved testing: Concepts like CI can be easily applied on IaC based projects and code snippets. This allow you to test and verify operations beforehand -
What is one reason why manual processes can be helpful?
For learning a platform when first starting out
Why is it advisable to avoid using manual processes when creating infrastructure at scale? Manual processes for creating infrastructure are slow because they require human intervention for each step, which delays deployment. They are error-prone since manual configuration increases the risk of mistakes and inconsistencies. Additionally, these processes are not easily repeatable, making it difficult to ensure the same infrastructure setup across different environments—unlike Infrastructure as Code (IaC), which automates and standardizes deployments.
What are some of Terraform features?
- Declarative: Terraform uses the declarative approach (rather than the procedural one) in order to define end-status of the resources - No agents: as opposed to other technologies (e.g. Puppet) where you use a model of agent and server, with Terraform you use the different APIs (of clouds, services, etc.) to perform the operations - Community: Terraform has strong community who constantly publishes modules and fixes when needed. This ensures there is good modules maintenance and users can get support quite quickly at any point -
What language does Terraform uses?
A DSL called "HCL" (Hashicorp Configuration Language). A declarative language for defining infrastructure.
What's a typical Terraform workflow?
1. Write Terraform definitions: `.tf` files written in HCL that described the desired infrastructure state (and run `terraform init` at the very beginning) 2. Review: With command such as `terraform plan` you can get a glance at what Terraform will perform with the written definitions 3. Apply definitions: With the command `terraform apply` Terraform will apply the given definitions, by adding, modifying or removing the resources This is a manual process. Most of the time this is automated so user submits a PR/MR to propose terraform changes, there is a process to test these changes and once merged they are applied (`terraform apply`).
What are some use cases for using Terraform?
- Infra provisioning and management: You need to automated or code your infra so you are able to test it easily, apply it and make any changes necessary. - Multi-cloud environment: You manage infrastructure on different clouds, but looking for a consistent way to do it across the clouds - Consistent environments: You manage environments such as test, production, staging, ... and looking for a way to have them consistent so any modification in one of them, applies to other environments as well
What's the difference between Terraform and technologies such as Ansible, Puppet, Chef, etc.
Terraform is considered to be an IaC technology. It's used for provisioning resources, for managing infrastructure on different platforms. Ansible, Puppet and Chef are Configuration Management technologies. They are used once there is an instance running and you would like to apply some configuration on it like installing an application, applying security policy, etc. To be clear, CM tools can be used to provision resources so in the end goal of having infrastructure, both Terraform and something like Ansible, can achieve the same result. The difference is in the how. Ansible doesn't saves the state of resources, it doesn't know how many instances there are in your environment as opposed to Terraform. At the same time while Terraform can perform configuration management tasks, it has less modules support for that specific goal and it doesn't track the task execution state as Ansible. The differences are there and it's most of the time recommended to mix the technologies, so Terraform used for managing infrastructure and CM technologies used for configuration on top of that infrastructure.
### Terraform Hands-On Basics
Explain the following block of Terraform code ``` resource "aws_instance" "some-instance" { ami = "ami-201720221991yay" instance_type = "t2.micro } ```
It's a resource of type "aws_instance" used to provision an instance. The name of the resource (NOT INSTANCE) is "some-instance". The instance itself will be provisioned with type "t2.micro" and using an image of the AMI "ami-201720221991yay".
What do you do next after writing the following in main.tf file? ``` resource "aws_instance" "some-instance" { ami = "ami-201720221991yay" instance_type = "t2.micro } ```
Run `terraform init`. This will scan the code in the directory to figure out which providers are used (in this case AWS provider) and will download them.
You've executed terraform init and now you would like to move forward to creating the resources but you have concerns and would like to make be 100% sure on what you are going to execute. What should you be doing?
Execute `terraform plan`. That will provide a detailed information on what Terraform will do once you apply the changes.
You've downloaded the providers, seen the what Terraform will do (with terraform plan) and you are ready to actually apply the changes. What should you do next?
Run `terraform apply`. That will apply the changes described in your .tf files.
Explain the meaning of the following strings that seen at the beginning of each line When you run terraform apply - '+' - '-' - '-/+'
- '+' - The resource or attribute is going to be added - '-' - the resource or attribute is going to be removed - '-/+' - the resource or attribute is going to be replaced
How to cleanup Terraform resources? Why the user should be careful doing so?
`terraform destroy` will cleanup all the resources tracked by Terraform. A user should be careful with this command because there is no way to revert it. Sure, you can always run again "apply" but that can take time, generates completely new resources, etc.
### Dependencies
Sometimes you need to reference some resources in the same or separate .tf file. Why and how it's done?
Why: because resources are sometimes connected or need to be connected. For example, you create an AWS instance with "aws_instance" resource but, at the same time you would like also to allow some traffic to it (because by default traffic is not allowed). For that you'll create a "aws_security_group" resource and then, in your aws_instance resource, you'll reference it. How: Using the syntax .. In your AWS instance it would like that: ``` resource "aws_instance" "some-instance" { ami = "some-ami" instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.instance.id] } ```
Does it matter in which order Terraform creates resources?
Yes, when there is a dependency between different Terraform resources, you want the resources to be created in the right order and this is exactly what Terraform does. To make it ever more clear, if you have a resource X that references the ID of resource Y, it doesn't makes sense to create first resource X because it won't have any ID to get from a resource that wasn't created yet.
Is there a way to print/see the dependencies between the different resources?
Yes, with `terraform graph` The output is in DOT - A graph description language.
### Providers
Explain what is a "provider"
[terraform.io](https://www.terraform.io/docs/language/providers/index.html): "Terraform relies on plugins called "providers" to interact with cloud providers, SaaS providers, and other APIs...Each provider adds a set of resource types and/or data sources that Terraform can manage. Every resource type is implemented by a provider; without providers, Terraform can't manage any kind of infrastructure."
Where can you find publicly available providers?
In the [Terraform Registry](https://registry.terraform.io/browse/providers)
What are the names of the providers in this case? ``` terraform { required_providers { aws = { source = "hashicorp/aws" } azurerm = { source = "hashicorp/azurerm" version = "~> 3.0.2" } } } ```
azurerm and aws
How to install a provider?
You write a provider block like the following one and run `terraform init` ``` provider "aws" { region = "us-west-1" } ```
True or False? Applying the following Terraform configuration will fail since no source or version specific for 'aws' provider ``` terraform { required_providers { aws = {} } } ```
False. It will look for "aws" provider in the public Terraform registry and will take the latest version.
Write a configuration of a Terraform provider (any type you would like)
AWS is one of the most popular providers in Terraform. Here is an example of how to configure it to use one specific region and specifying a specific version of the provider ``` terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 3.0" } } } # Configure the AWS Provider provider "aws" { region = "us-west-2" } ```
Where Terraform installs providers from by default?
By default Terraform providers are installed from Terraform Registry
What is the Terraform Registry?
The Terraform Registry provides a centralized location for official and community-managed providers and modules.
Where providers are downloaded to? (when for example you run terraform init)
`.terraform` directory.
Describe in high level what happens behind the scenes when you run terraform init on on the following Terraform configuration ``` terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 3.0" } } } ```
1. Terraform checks if there is an aws provider in this address: `registry.terraform.io/hashicorp/aws` 2. Installs latest version of aws provider (assuming the URL exists and valid)
True or False? You can install providers only from hashicorp
False. You can specify any provider from any URL, not only those from hashicorp.
### Variables #### Input Variables
What are input variables good for in Terraform?
Variables allow you define piece of data in one location instead of repeating the hardcoded value of it in multiple different locations. Then when you need to modify the variable's value, you do it in one location instead of changing each one of the hardcoded values.
What type of input variables are supported in Terraform?
``` string number bool list() set() map() object({ = , ... }) tuple([, ...]) ```
What's the default input variable type in Terraform?
`any`
What ways are there to pass values for input variables?
- Using `-var` option in the CLI - Using a file by using the `-var-file` option in the CLI - Environment variable that starts with `TF_VAR_` If no value given, user will be prompted to provide one.
How to reference a variable?
Using the syntax `var.`
[Question] You are configuring a variable for your Terraform configuration. Which arguments are required when configuring a `variable` block? 1. `type` and `description` 2. There are no required arguments 3. `type` 4. `type`, `description` and `default` [Answer] 2. There are no required arguments In Terraform, when declaring a variable block, there are no mandatory arguments. You can create a variable with an empty block like: variable "example" {} While `type`, `description`, and `default` are commonly used, they're all optional. The `type` argument helps with validation, `description` documents the variable's purpose, and `default` provides a fallback value if none is specified.
What is the effect of setting variable as "sensitive"?
It doesn't show its value when you run `terraform apply` or `terraform plan` but eventually it's still recorded in the state file.
True or False? If an expression's result depends on a sensitive variable, it will be treated as sensitive as well
True
The same variable is defined in the following places: - The file `terraform.tfvars` - Environment variable - Using `-var` or `-var-file` According to variable precedence, which source will be used first?
Terraform loads variables in the following order, with later sources taking precedence over earlier ones: - Environment variable - The file `terraform.tfvars` - Using `-var` or `-var-file`
Whenever you run terraform apply, it prompts to enter a value for a given variable. How to avoid being prompted?
While removing the variable is theoretically a correct answer, it will probably fail the execution. You can use something like the `-var` option to provide the value and avoid being prompted to insert a value. Another option is to run `export TF_VAR_=`.
#### Output Variables
What are output variables? Why do we need them?
Output variable allow you to display/print certain piece of data as part of Terraform execution. The most common use case for it is probably to print the IP address of an instance. Imagine you provision an instance and you would like to know what the IP address to connect to it. Instead of looking for it for the console/OS, you can use the output variable and print that piece of information to the screen
Explain the "sensitive" parameter of output variable
When set to "true", Terraform will avoid logging output variable's data. The use case for it is sensitive data such as password or private keys.
Explain the "depends" parameter of output variable
It is used to set explicitly dependency between the output variable and any other resource. Use case: some piece of information is available only once another resource is ready.
#### Locals
What are locals?
Similarly to variables they serve as placeholders for data and values. Differently from variables, users can't override them by passing different values.
What's the use case for using locals?
You have multiple hardcoded values that repeat themselves in different sections, but at the same time you don't want to expose them as in, allow users to override values.
#### Variables Hands-On
Demonstrate input variable definition with type, description and default parameters
``` variable "app_id" { type = string description = "The id of application" default = "some_value" } ``` Unrelated note: variables are usually defined in their own file (vars.tf for example).
How to define an input variable which is an object with attributes "model" (string), "color" (string), year (number)?
``` variable "car_model" { description = "Car model object" type = object({ model = string color = string year = number }) } ``` Note: you can also define a default for it.
How to reference variables?
Variable are referenced with `var.VARIABLE_NAME` syntax. Let's have a look at an example: vars.tf: ``` variable "memory" { type = string default "8192" } variable "cpu" { type = string default = "4" } ``` main.tf: ``` resource "libvirt_domain" "vm1" { name = "vm1" memory = var.memory cpu = var.cpu } ```
How to reference variable from inside of string literal? (bonus question: how that type of expression is called?)
Using the syntax: `"${var.VAR_NAME}"`. It's called "interpolation". Very common to see it used in user_data attribute related to instances. ``` user_data = <<-EOF This is some fabulos string It demonstrates how to use interpolation Yes, it's truly ${var.awesome_or_meh} EOF ```
How can list all outputs without applying Terraform changes?
`terraform output` will list all outputs without applying any changes
Can you see the output of specific variable without applying terrafom changes?
Yes, with `terraform output `. Very useful for scripts :)
Demonstrate how to define locals
``` locals { x = 2 y = "o" z = 2.2 } ```
Demonstrate how to use a local
if we defined something like this ``` locals { x = 2 } ``` then to use it, you have to use something like this: `local.x`
### Data Sources
Explain data sources in Terraform
- Data sources used to get data from providers or in general from external resources to Terraform (e.g. public clouds like AWS, GCP, Azure). - Data sources used for reading. They are not modifying or creating anything - Many providers expose multiple data sources
Demonstrate how to use data sources
``` data "aws_vpc" "default { default = true } ```
How to get data out of a data source?
The general syntax is `data...` So if you defined the following data source ``` data "aws_vpc" "default { default = true } ``` You can retrieve the ID attribute this way: `data.aws_vpc.default.id`
Is there such a thing as combining data sources? What would be the use case?
Yes, you can define a data source while using another data source as a filter for example. Let's say we want to get AWS subnets but only from our default VPC: ``` data "aws_subnets" "default" { filter { name = "vpc-id" values = [data.aws_vpc.default.id] } } ```
### Lifecycle
When you update a resource, how it works?
By default the current resource is deleted, a new one is created and any references pointing the old resource are updated to point the new resource
Is it possible to modify the default lifecycle? How? Why?
Yes, it's possible. There are different lifecycles one can choose from. For example "create_before_destroy" which inverts the order and first creates the new resource, updates all the references from old resource to the new resource and then removes the old resource. How to use it: ``` lifecycle { create_before_destroy = true } ``` Why to use it in the first place: you might have resources that have dependency where they dependency itself is immutable (= you can't modify it hence you have to create a new one), in such case the default lifecycle won't work because you won't be able to remove the resource that has the dependency as it still references an old resource. AWS ASG + launch configurations is a good example of such use case.
You've deployed a virtual machine with Terraform and you would like to pass data to it (or execute some commands). Which concept of Terraform would you use?
[Provisioners](https://www.terraform.io/docs/language/resources/provisioners)
### Provisioners
What are "Provisioners"? What they are used for?
Provisioners can be described as plugin to use with Terraform, usually focusing on the aspect of service configuration and make it operational. Few example of provisioners: - Run configuration management on a provisioned instance using technology like Ansible, Chef or Puppet. - Copying files - Executing remote scripts
Why is it often recommended to use provisioners as last resort?
Since a provisioner can run a variety of actions, it's not always feasible to plan and understand what will happen when running a certain provisioner. For this reason, it's usually recommended to use Terraform built-in option, whenever's possible.
What is local-exec and remote-exec in the context of provisioners?
local-exec provisioners run commands on the machine where Terraform is executed, while remote-exec provisioners run commands on the remote resource.
What is a "tainted resource"?
It's a resource which was successfully created but failed during provisioning. Terraform will fail and mark this resource as "tainted".
What terraform taint does?
terraform taint resource.id manually marks the resource as tainted in the state file. So when you run terraform apply the next time, the resource will be destroyed and recreated.
What is a data source? In what scenarios for example would need to use it?
Data sources lookup or compute values that can be used elsewhere in terraform configuration. There are quite a few cases you might need to use them: - you want to reference resources not managed through terraform - you want to reference resources managed by a different terraform module - you want to cleanly compute a value with typechecking, such as with aws_iam_policy_document
What are output variables and what terraform output does?
Output variables are named values that are sourced from the attributes of a module. They are stored in terraform state, and can be used by other modules through remote_state
Explain "Remote State". When would you use it and how?
Terraform generates a `terraform.tfstate` json file that describes components/service provisioned on the specified provider. Remote State stores this file in a remote storage media to enable collaboration amongst team.
Explain "State Locking"
State locking is a mechanism that blocks an operations against a specific state file from multiple callers so as to avoid conflicting operations from different team members. Once the first caller's operation's lock is released the other team member may go ahead to carryout his own operation. Nevertheless Terraform will first check the state file to see if the desired resource already exist and if not it goes ahead to create it.
Aside from .tfvars files or CLI arguments, how can you inject dependencies from other modules?
The built-in terraform way would be to use remote-state to lookup the outputs from other modules. It is also common in the community to use a tool called terragrunt to explicitly inject variables between modules.
How do you import existing resource using Terraform import?
1. Identify which resource you want to import. 2. Write terraform code matching configuration of that resource. 3. Run terraform command terraform import RESOURCE ID
eg. Let's say you want to import an aws instance. Then you'll perform following: 1. Identify that aws instance in console 2. Refer to it's configuration and write Terraform code which will look something like: ``` resource "aws_instance" "tf_aws_instance" { ami = data.aws_ami.ubuntu.id instance_type = "t3.micro" tags = { Name = "import-me" } } ``` 3. Run terraform command terraform import aws_instance.tf_aws_instance i-12345678
### State
Can you name three different things included in the state file?
- The representation of resources - JSON format of the resources, their attributes, IDs, ... everything that required to identify the resource and also anything that was included in the .tf files on these resources - Terraform version - Outputs
Why does Terraform keep state and how do local and remote state differ?
Terraform stores state to map real infrastructure objects to the resources declared in code, keep track of dependencies, detect drift, and speed up planning by caching attribute data.

- Local state is stored in a `terraform.tfstate` file on disk and is suitable for quick experiments or single-operator workflows.
- Remote state lives in a backend (such as S3, GCS, Terraform Cloud) that can enforce locking, access controls, and versioning so teams share a single source of truth.
Always choose a remote backend once there is more than one operator or automation pipeline touching the configuration.
Why does it matter where you store the tfstate file? In your answer make sure to address the following: - Public vs. Private - Git repository vs. Other locations
* tfstate contains credentials in plain text. You don't want to put it in publicly shared location.
* tfstate shouldn't be modified concurrently so putting it in a shared location available for everyone with "write" permissions might lead to issues. (Terraform remote state doesn't has this problem).
* tfstate is an important file. As such, it might be better to put it in a location that has regular backups and good security.

As such, tfstate shouldn't be stored in git repositories. secured storage such as secured buckets, is a better option.
True or False? it's common to edit terraform state file directly by hand and even recommended for many different use cases
False. You should avoid as much possible to edit Terraform state files directly by hand. Prefer supported commands such as `terraform state mv`, `terraform state rm`, or `terraform state replace-provider` so the CLI keeps metadata consistent.
Why storing state file locally on your computer may be problematic?
In general, storing state file on your computer isn't a problem. It starts to be a problem when you are part of a team that uses Terraform and then you would like to make sure it's shared. In addition to being shared, you want to be able to handle the fact that different teams members can edit the file and can do it at the same time, so locking is quite an important aspect as well.
Mention some best practices related to tfstate
- Don't edit it manually. tfstate was designed to be manipulated by terraform and not by users directly.
- Store it in secured location (since it can include credentials and sensitive data in general).
- Backup it regularly so you can roll-back easily when needed.
- Store it in remote shared storage. This is especially needed when working in a team and the state can be updated by any of the team members.
- Enabled versioning if the storage where you store the state file, supports it. Versioning is great for backups and roll-backs in case of an issue.
- Designate "state owners" who review access, rotate credentials, and execute migrations.
- Keep `.tfstate` files and the `.terraform/` directory out of version control (`.gitignore`) and encrypt any ad-hoc backups.
How and why concurrent edits of the state file should be avoided?
If there are two users or processes concurrently editing the state file it can result in invalid state file that doesn't actually represents the state of resources.

To avoid that, Terraform can apply state locking if the backend supports that. For example, AWS s3 supports state locking and consistency via DynamoDB. Often, if the backend supports it, Terraform will make use of state locking automatically so nothing is required from the user to activate it. Automation pipelines should wait for locks to clear instead of forcing unlocks.
Describe how you manage state file(s) when you have multiple environments (e.g. development, staging and production)
Probably no right or wrong answer here, but it seems, based on different source, that the overall preferred way is to have a dedicated state file per environment.

Common patterns:
- Separate backend configurations per environment (different S3 prefixes or even different buckets and DynamoDB tables).
- Distinct workspaces or directories when you need isolated state plus different credentials.
- Terraform Cloud organizations/workspaces mapped to environments with role-based access control.
Why storing the state in versioned control repo is not a good idea?
- Sensitive data: some resources may specify sensitive data (like passwords and tokens) and everything in a state file is stored in plain text.
- Prone to errors: when working with Git repos, you mayo often find yourself switch branches, checkout specific commits, perform rebases, ... all these operations may end up in you eventually performing `terraform apply` on non-latest version of your Terraform code. Keeping state outside Git eliminates that risk.
- Lack of locking and audit trails compared to managed backends.
#### Terraform Backend
What's a Terraform backend? What is the default backend?
Terraform backend determines how the Terraform state is stored and loaded. By default the state is local, but it's possible to set a remote backend.
How do you configure an AWS S3 backend with DynamoDB state locking?
Use a remote backend when you need a shared, durable source of truth. A minimal configuration looks like: ```hcl terraform { required_version = ">= 1.6.0" backend "s3" { bucket = "my-tfstate-bucket" key = "prod/network/terraform.tfstate" region = "us-east-1" dynamodb_table = "tf-locks" encrypt = true } } ``` - Create the S3 bucket with versioning, default encryption, and block public access before enabling the backend.
- Provision a DynamoDB table with the primary key `LockID` so Terraform can acquire locks.
- Use IAM least privilege policies that allow only state operations on the bucket and table to reduce blast radius.
Which commands help you inspect, migrate, and safely manipulate Terraform state?
```bash terraform init -migrate-state terraform plan -refresh-only terraform state list terraform show -json | jq '.values.root_module.resources[].address' | head terraform state mv module.vpc.aws_subnet.public[0] module.vpc.aws_subnet.public_a ``` - `terraform init -migrate-state`: moves an existing local state file into the configured backend with locking.
- `terraform plan -refresh-only`: detects drift by refreshing remote object attributes without proposing changes.
- `terraform state list`: confirms which objects are tracked in the state after migrations or imports.
- `terraform show -json`: enables scripted inspections (paired here with `jq`) so you can audit addresses or metadata.
- `terraform state mv`: renames or relocates resources/modules without recreation, useful during refactors.
What best practices keep Terraform state secure and reliable?
- Encrypt and version your remote backend (for example `aws_s3_bucket_versioning` plus SSE-KMS on S3).
- Enforce locking (DynamoDB, GCS locking, or Terraform Cloud workspaces) and monitor stuck locks.
- Grant IAM minimum privileges and rotate access keys; automation should assume roles with short-lived credentials.
- Schedule automated backups of the backend (S3 replication, DynamoDB PITR) and periodically test restores.
- Document "state owner" responsibility, incident response, and break-glass steps for unlocks or manual edits (should be last resort).
How do you migrate from local state to a new remote backend?
1. Provision backend resources (for example an S3 bucket with versioning and a DynamoDB table) from a separate bootstrap configuration.
2. Add the backend block to your Terraform configuration and run `terraform init -migrate-state`.
3. Verify the migration with `terraform state list` or `terraform state pull` and keep a secure backup of the previous file.
4. Remove or archive the local `terraform.tfstate` only after confirming new plans operate against the remote backend.
How terraform apply workflow is different when a remote backend is used?
It starts with acquiring a state lock so others can't modify the state at the same time. The apply will also download the latest state snapshot before planning and upload the updated state atomically after the run completes.
What would be the process of switching back from remote backend to local?
1. You remove the backend code and perform `terraform init` to switch back to `local` backend.
2. You remove the resources that are the remote backend itself.
3. Archive the latest remote state file so you can recover if the local copy gets corrupted.
True or False? it's NOT possible to use variable in a backend configuration
That's true and quite a limitation as it means you'll have to go to the resources of the remote backend and copy some values to the backend configuration.

One way to deal with it is using partial configurations in a completely separate file from the backend itself and then load them with `terraform init -backend-config=some_backend_partial_conf.hcl`.
Is there a way to obtain information from a remote backend/state using Terraform?
Yes, using the concept of data sources. There is a data source for a remote state called "terraform_remote_state".

You can use it the following syntax `data.terraform_remote_state..outputs.`
How does a remote state backend improve collaboration for a Terraform project?
By storing the state file in a shared location enabling multiple people or processes to work with the same state. A remote state backend improves collaboration on Terraform projects by addressing the core challenge of sharing infrastructure state. When a team works on infrastructure, everyone needs access to the current state to safely make changes, and locking prevents clashing applies.
#### Workspaces
Explain what is a Terraform workspace
[developer.hashicorp.com](https://developer.hashicorp.com/terraform/language/state/workspaces): "The persistent data stored in the backend belongs to a workspace. The backend initially has only one workspace containing one Terraform state associated with that configuration. Some backends support multiple named workspaces, allowing multiple states to be associated with a single configuration."
True or False? Each workspace has its own state file
True
Why workspaces might not be the best solution for managing states for different environments? like staging and production
One reason is that all the workspaces are stored in one location (as in one backend) and usually you don't want to use the same access control and authentication for both staging and production for obvious reasons. Also working in workspaces is quite prone to human errors as you might accidentally think you are in one workspace, while you are working a completely different one.
#### State Hands-On
Which command will produce a state file?
`terraform apply`
How to inspect current state?
`terraform show`
How to list resources created with Terraform?
`terraform state list`
How do you rename an existing resource?
`terraform state mv`
How to create a new workspace?
`terraform workspace new `
How to identify which workspace are you using?
`terraform workspace show`
### Terraform Structures and Syntax #### Lists
How to define an input variable which is a list of numbers?
``` variable "list_of_nums" { type = list(number) description = "An example of list of numbers" default = [2, 0, 1, 7] } ```
How to create a number of resources based on the length of a list?
``` resource "some_resource" "some_name" { count = length(var.some_list) } ```
You have a list variable called "users" with an object containing a name attribute like this:
``` variable "users" { type = list(object({ name = string age = number })) } ``` How to access the name attribute of the second item in that list?

`users[1].name`
Given the same list, how to access attribute "name" of all items?
`users[*].name`
#### Loops
What loops are useful for in Terraform?
The most common use case is when you need to create multiple resources with only a slight difference (like different name). Instead of defining multiple separate resources, you can define it once and create multiple instances of it using loops.
Demonstrate how to define a simple Terraform loop
``` resource "aws_instance" "server" { count = 15 } ``` The above configuration will create 15 aws instances.
How to create multiple AWS instances but each with a different name?
``` resource "aws_instance" "server" { count = 6 tags = { Name = "instance-${count.index}" } } ``` The above configuration will create 6 instances, each with a different name.
You have the following variable defined in Terraform ``` variable "users" { type = list(string) default = ["mario", "luigi", "peach"] } ``` How to use it to create users on one of the public clouds (or any other platform, infra)?
``` resource "aws_iam_user" "user" { count = length(var.users) name = var.users[count.index] } ```
Is there any limitation to "count" meta-argument?
- `count` isn't supported within an inline block - It's quite limited when it comes to lists.You'll notice that modifying items in lists or even operations like removal sometimes interpreted in a way you didn't expect. For example, removing an item from a list, may shift other items to a new position and since each position represents a resource with count, that may lead to a result where wrong resources are being modified and removed. There are ways to do deal it, but still using count with lists is not always straightforward
What's a for_each loop? How is it different from "count"?
- for_each can applied only on collections like maps or sets (as opposed to count that can be applied on lists) - for_each helps to deal with the limitation of `count` which isn't optimal for use cases of modifying lists - for_each supports inline blocks as opposed to `count`
Demonstrate how to use the for_each loop
``` resource “google_compute_instance” “instances” { for_each = var.names_map name = each.value } ```
The following resource tries to use for_each loop on a list of strings but it fails, why? ``` resource “google_compute_instance” “instances” { for_each = var.names name = each.value } ```
for_each can applied only on collections like maps or sets so the list should be converted to a set like this: `for_each = toset(var.names)`
How to use for_each loop for inline blocks?
``` resource "some_instance" "instance" { dynamic "tag" { for_each = var.tags content { key = tag.key value = tag.value } } } ```
There is a list variable called "users". You would like to define an output variable with a value of all users in uppercase. How to achieve that?
``` output "users" { value = [for name in var.user_names : upper(name)] } ```
What's the result of the following code? ``` resource "random_integer" "num" { min = 20 max = 17 } resource "aws_instance" "instances" { count = random_integer.num.results } ```
The above code will fail as it's not possible to reference resource outputs with count, because Terraform has to compute count before any resources are created (or modified).
There is a variable called "values" with the following value: ["mario", "luigi", "peach"]. How to create an output variable with the string value of the items in the list: "mario, luigi, peach," ?
``` output "users" { value = "%{ for name in var.values }${name}, %{ endfor }" } ```
There is a list variable called "users". You would like to define an output variable with a value of all users in uppercase but only if the name is longer than 3 characters. How to achieve that?
``` output "users" { value = [for name in var.user_names : upper(name) if length(name) > 3] } ```
#### Maps
There is a map called "instances" - How to extract only the values of that map? - How to extract only the attribute "name" from each value?
- Using the values built-in function: `values(instances)` - `values(instances)[*].name`
You have a map variable, called "users", with the keys "name" and "age". Define an output list variable with the following "my name is {name} and my age is {age}"
``` output "name_and_age" { value = [for name, age in var.users : "my name is ${name} and my age is ${age}"] } ```
You have a map variable, called "users", with the keys "name" (string) and "age" (number). Define an output map variable with the key being name in uppercase and value being age in the closest whole number
``` output "name_and_age" { value = {for name, age in var.users : upper(name) => floor(age) } ```
#### Conditionals
How to use conditional expressions in Terraform?
`some_condition ? "value_if_true" : "value_if_false"`
Explain the following condition: var.x ? 1 : 0
If `x` evaluated to true, the result is 1, otherwise (if false) the result is 0.
Explain the following condition: var.x != "" ? var.x : "yay"
If `x` is an empty string the result is "yay", otherwise it's the value of `x` variable
Can conditionals be used with meta-arguments?
Yes, for example the "count" meta-argument: ``` resource "aws_instance" "server" { count = var.amount ? 1 : 0 ... } ```
Is it possible to combine conditionals and loop?
Yes, for example: ``` dynamic "tag" { for_each = { for key, value in var.tags: key => value if key != "" } } ```
#### Misc
What are meta-arguments in Terraform?
Arguments that affect the lifecycle of a resources (its creation, modification, ...) and supported by Terraform regardless to the type of resource in which they are used. Some examples: - count: how many resources to create out of one definition of a resource - lifecycle: how to treat resource creation or removal
What meta-arguments are you familiar with?
- count: how many resources to create out of one definition of a resource - lifecycle: how to treat resource creation or removal - depends_on: create a dependency between resources
What templatefile function does?
Renders a template file and returns the result as string.
You are trying to use templatefile as part of a module and you use a relative path to load a file but sometimes it fails, especially when others try to reuse the module. How can you deal with that?
Switch relative paths with what is known as path references. These are fixes: paths like module root path, module expression file path, etc.
How do you test terraform syntax?
A valid answer could be "I write Terraform configuration and try to execute it" but this makes testing cumbersome and quite complex in general. There is `terraform console` command which allows you to easily execute terraform functions and experiment with general syntax.
True or False? Terraform console should be used carefully as it may modify your resources
False. terraform console is ready-only.
You need to render a template and get it as string. Which function would you use?
`templatefile` function.
Explain what depends_on used for and given an example
`depends_on` used to create an explicit dependency between resources in Terraform. For example, there is an application you would like to deploy in a cluster. If the cluster isn't ready (and also managed by Terraform of course) then you can't deploy the app. In this case, you will define "depends_on" in the app configuration and its value will be the cluster resource.
### Modules
Explain Modules
[Terraform.io](https://www.terraform.io/language/modules/develop): "A module is a container for multiple resources that are used together. Modules can be used to create lightweight abstractions, so that you can describe your infrastructure in terms of its architecture, rather than directly in terms of physical objects." In addition, modules are great for creating reusable Terraform code that can be shared and used not only between different repositories but even within the same repo, between different environments (like staging and production). Well-designed modules also expose a small, opinionated surface and hide implementation details.
What makes a Terraform code module? In other words, what a module is from practical perspective?
Basically any file or files in a directory is a module in Terraform. There is no special syntax to use in order to define a module. The root configuration is itself a module, and any module that is called from it is considered a child module.
When should you use a module instead of inline resources?
- You need to reuse infrastructure patterns across environments, teams, or regions without copy-paste.
- You want to codify best practices (naming, tagging, security controls) behind a stable interface.
- You need to version and promote infrastructure changes using semantic releases and CI pipelines.
- You want to limit blast radius by updating one module and rolling it out gradually.
What is a recommended layout for a reusable module?
```text modules/vpc/ main.tf variables.tf outputs.tf versions.tf README.md examples/ basic/ main.tf ``` - `main.tf` defines resources, data sources, and locals.
- `variables.tf` contains typed inputs with descriptions and validations.
- `outputs.tf` exposes the minimal set of outputs downstream stacks need.
- `versions.tf` pins compatible Terraform and provider versions.
- `examples/` hosts runnable samples for documentation and testing, usually referenced by CI.
How do you consume a versioned module from a VCS or the Terraform Registry?
```hcl module "vpc" { source = "git::https://github.com/org/infra-modules.git//vpc?ref=v1.2.3" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b"] } ``` - For registry modules, use `version` constraints (for example `version = ">= 1.2.0, < 2.0.0"`).
- Always pin module versions to keep plans reproducible and review upstream releases before upgrading.
- Prefer tags or immutable SHAs for VCS sources to guarantee repeatable builds.
How do modules handle inputs, outputs, locals and validation effectively?
- Define typed variables with defaults and `validation` blocks to guard against invalid CIDRs, AZ counts, or empty strings.
- Use `locals` to transform data or derive names, and combine with `for_each`/`count` for deterministic resource creation.
- Document outputs in `outputs.tf`, mark sensitive ones, and keep outputs minimal to reduce coupling.
- Surface module metadata (like version or tags) through outputs for downstream modules or automation.
Which module anti-patterns should you avoid?
- Embedding provider blocks inside the module which makes reuse across accounts harder.
- Accepting overly generic `map(any)` inputs that hide required structure instead of typed objects.
- Outputting secrets or credentials without `sensitive = true` or secret stores.
- Creating "god" root modules that mix networking, compute, and application concerns instead of composing smaller modules.
- Copying and pasting modules without versioning, documentation, or automated tests.
How do you test and lint modules during development?
```bash terraform -chdir=examples/vpc init terraform -chdir=examples/vpc validate terraform -chdir=examples/vpc plan tflint --module ``` - `terraform validate` catches syntax errors and missing providers.
- Running `plan` against an example stack surfaces integration issues before promotion.
- `tflint --module` adds provider-specific linting; tools like Terratest can provision ephemeral infra for assertions.
- Include these commands in CI so every module change is exercised automatically.
How do you test a Terraform module?
There are multiple answers, but the most common answer would likely to be using the tool terratest, and to test that a module can be initialized, can create resources, and can destroy those resources cleanly. You can also rely on `terraform validate`, `terraform plan` with sample configurations, and linting tools such as `tflint`.
When creating a module, do you prefer to use inline blocks, separate resources or both? why?
No right or wrong here.

Personally, I prefer to use only separate resources in modules as it makes modules more flexible. So if a resource includes inline blocks, that may limit you at some point.
True or False? Module source can be only local path
False. It can be a Git URL, HTTP URL, ... for example: ``` module "some_module" { source = "github.com/foo/modules/bar?ref=v0.1" } ```
Where can you obtain Terraform modules?
Terraform modules can be found at the [Terrafrom registry](https://registry.terraform.io/browse/modules)
You noticed there are relative paths in some of your modules and you would like to change that. What can you do and why is that a problem in the first place?
Relative paths usually work fine in your own environment as you are familiar with the layout and paths used, but when sharing a module and making it reusable, you may bump into issues as it runs on different environments where the relative paths may no longer be relevant. A better approach would be to use `path reference` like one of the following: - `path.module`: the path of the module where the expression is used - `path.cwd`: the path of the current working directory - `path.root`: the path of the root module
#### Modules Hands-On
How to use a module?
The general syntax is: ```hcl module "" { source = "" version = ">= 1.2.0, < 2.0.0" # module inputs } ``` The critical part is the source which you use to tell Terraform where the module can be found. When the module comes from the registry, you can pin the `version`; for local paths omit that attribute.
Demonstrate using a module called "amazing_modle" in the path "../modules/amazing-module"
``` module "amazing_module" { source = "../modules/amazing-module" } ```
What should be done every time you modify the source parameter of a module?
Run `terraform init -upgrade` (or `terraform get -update` on older versions) so Terraform downloads the new version of the module and updates the `.terraform/modules` directory.
How to access module output variables?
The general syntax is `module..`
You would like to load and render a file from module directory. How to do that?
script = templatefile("${path.module}/user-data.sh", { ... })
There is a module to create a compute instance. How would you use the module to create three separate instances?
starting with Terraform 0.13, the `count` meta-argument can be used with modules. So you could use something like this: ``` module "instances" { source = "/some/module/path" count = 3 } ``` You can also use it in outputs vars: `value = module.instances[*]`
How to use a module with for_each to create networks per availability zone?
```hcl module "subnet" { source = "../modules/subnet" for_each = toset(var.azs) name = "subnet-${each.key}" az = each.value } ``` `for_each` keeps addresses stable and lets you add or remove AZs without re-creating the entire module.
#### Interview Cheatsheet - Remote state belongs in an encrypted, versioned S3 bucket with a DynamoDB table for locking. - Run `terraform init -migrate-state` when enabling a backend so existing state moves safely. - Detect drift quickly with `terraform plan -refresh-only` before every change. - Script state reviews with `terraform show -json | jq '.values.root_module.resources[].address'`. - Keep `.tfstate*` out of Git; grant IAM least privilege and rotate state access credentials. - Structure reusable modules with `main.tf`, `variables.tf`, `outputs.tf`, `versions.tf`, `README.md`, and an `examples/` folder. - Pin module versions (`?ref=` or `version` constraints) to make plans reproducible and review upgrades deliberately. - Validate modules in CI using `terraform validate`, `terraform plan` on examples, and `tflint --module`. - Avoid module anti-patterns like embedding provider blocks, exposing secrets, or hyper-generic inputs. - Use `locals` plus `for_each` inside modules for predictable names and per-environment customization. ### Import
Explain Terraform's import functionality
`terraform import` is a CLI command used for importing an existing infrastructure into Terraform's state. It's does NOT create the definitions/configuration for creating such infrastructure.
State two use cases where you would use terraform import
1. You have existing resources in one of the providers and they are not managed by Terraform (as in not included in the state) 2. You lost your tfstate file and need to rebuild it
### Version Control
You have a Git repository with Terraform files but no .gitignore. What would you add to a .gitignore file in Terraform repository?
``` **/.terraform/* *.tfstate *.tfstate.* *.tfvars *.tfvars.json ``` You don't want to store state file nor any downloaded providers in .terraform directory. It also doesn't makes sense to share/store the state backup files.
### AWS
What happens if you update user_data in the following case and apply the changes? ``` resource "aws_instance" "example" { ami = "..." instance_type = "t2.micro" user_data = <<-EOF #!/bin/bash echo "Hello, World" > index.xhtml EOF } ```
Nothing, because user_data is executed on boot so if an instance is already running, it won't change anything. To make it effective you'll have to use `user_data_replace_on_change = true`.
You manage ASG with Terraform which means you also have "aws_launch_configuration" resources. The problem is that launch configurations are immutable and sometimes you need to change them. This creates a problem where Terraform isn't able to delete ASG because they reference old launch configuration. How to do deal with it?
Add the following to "aws_launch_configuration" resource ``` lifecycle { create_before_destroy = true } ``` This will change the order of how Terraform works. First it will create the new resource (launch configuration). then it will update other resources to reference the new launch configuration and finally, it will remove old resources
How to manage multiple regions in AWS provider configuration?
``` provider "aws" { region = "us-west-1" alias = "west_region" } provider "aws" { region = "us-east-1" alias = "east_region" } data "aws_region" "west_region" { provider = aws.west_region } data "aws_region" "east_region" { provider = aws.east_region } ``` To use it: ``` resource "aws_instance" "west_region_instance" { provider = aws.west_region instance_type = "t2.micro" ... } ```
Assuming you have multiple regions configured and you would like to use a module in one of them. How to achieve that?
``` module "some_module" { source = "..." providers = { aws = aws.some_region } ... } ```
How to manage multiple AWS accounts?
One way is to define multiple different provider blocks, each with its own "assume_role" ``` provider "aws" { region = "us-west-1" alias = "some-region" assume_role { role_arn = "arn:aws:iam:::role/" } } ```
### Validations
How would you enforce users that use your variables to provide values with certain constraints? For example, a number greater than 1
Using `validation` block ``` variable "some_var" { type = number validation { condition = var.some_var > 1 error_message = "you have to specify a number greater than 1" } } ```
### Secrets
What's the issue with the following provider configuration? ``` provider "aws" { region = "us-west-1" access_key = "blipblopblap" secret_key = "bipbopbipbop" } ```
It's not secure! you should never store credentials in plain text this way.
What can you do to NOT store provider credentials in Terraform configuration files in plain text?
1. Use environment variables 2. Use password CLIs (like 1Password which is generic but there also specific provider options like aws-vault)
How can you manage secrets/credentials in CI/CD?
That very much depends on the CI/CD system/platform you are using. - GitHub Actions: Use Open ID Connect (OIDC) to establish connection with your provider. You then can specify in your GitHub Actions workflow the following: ``` - uses: aws-actions/configure-aws-credentials@v1 with: role-to-assume: arn:aws:iam::someIamRole aws-region: ... ``` - Jenkins: If Jenkins runs on the provider, you can use the provider access entities (like roles, policies, ...) to grant the instance, on which Jenkins is running, access control - CircleCI: you can use `CircleCI Context` and then specify it in your CircleCI config file ``` context: - some-context ```
What are the pros and cons of using environment variables for managing secrets in Terraform configurations?
Pros: - You avoid using secrets directly in configurations in plain text - free (no need to pay for secret management platforms/solutions) - Straightforward to use Cons: - Configurations might not be usable without the environment variables which may make impact the user experience as the user has to know what environment variables he should pass for everything to work properly - Mostly managed outside of Terraform mechanisms which makes it hard to enforce, track, ... anything that is related to secrets when it depends on the user to pass environment variables
True or False? If you pass secrets with environment variables, they are not visible in your state file
False. State files include sensitive data as it is. Which means it's very important that wherever you store your state file, it's encrypted and accessible only to those who should be able to access it.
True or False? If you pass secrets from a centralized secrets store (like Hashicorp Vault) they are not visible in plan files (terraform plan)
False. It doesn't matter where your secrets store (file, environment variables, centralized secrets store), they will be visible in both state file and plan output.
### Production This section is about how Terraform is actually used in real-life scenarios and organizations.
What structure layout do you use for your projects?
There is no right or wrong answer, just what you personally adopted or your team, and being able to explain why. One common approach is to have a separate directory for each environment. ``` terraform_project/ staging/ production/ ``` Each environment has its own backend (as you don't want to use the same authentication and access controls for all environments) Going further, under each environment you'll separate between components, applications and services ``` terraform_project/ staging/ applications/ some-app-service-1/ some-app-service-2/ databases/ mongo/ postgres/ networking/ ```
What files do you have you have in your Terraform projects?
Again, no right or wrong answer. Just your personal experience. main.tf providers.tf outputs.tf variables.tf dependencies.tf Each one of these files can be divided to smaller parts if needed (no reason to maintain VERY long files)
An engineer in your team complains about having to copy-paste quite a lot of code between different folders and files of Terraform. What would you do?
Suggest to use Terraform modules.
When working with nested layout of many directories, it can make it cumbresome to run terraform commands in many different folders. How to deal with it?
There are multiple ways to deal with it: 1. Write scripts that perform some commands recursively with different conditions 2. Use tools like Terragrunt where you commands like "run-all" that can run in parallel on multiple different paths
One of the engineers in your team complains the inline shell scripts are quite big and maintaining them in Terraform files seems like a bad idea. What would you do?
A good solution for not including shell scripts inline (as in inside terraform configuration files) is to keep them in a separate file and then use the terraform `templatefile` function to render and get them as a string
You noticed a lot of your Terraform code/configuration is duplicated, between repositories and also within the same repository between different directories. What one way you may adopt that will help handling with that?
Using Terraform modules can help greatly with duplicated code and so different environments for example (staging and production) can reuse the same code by using the same modules.
You noticed your Terraform code includes quite a lot of hardcoded values (like ports, subnets, ...) and they are duplicated in many locations. How'd you deal with it?
Using variables might not be a good solution because some things shouldn't be exposed and accidentally overridden. In such case you might want to use the concept of `locals`
Every time there is a change in tags standards (for example your team decided to change one of the tags' name) you find yourself changing tags in multiple files and you find the process quite tedious. What can be done about it?
Instead of defining tags at resource level, consider using `default_tags` as part of the provider configuration.
You would like to change the name of a resource but afraid to cause downtime. What can be done?
If it's a matter of changing a resource name, you could make use of `terraform state mv `
You try to deploy a cluster and an app on that cluster, but the app resource was created before the cluster. How to manage such situation?
Use the meta-argument `depends_on` in the app resource definition. This way the app will depend on the cluster resource and order will be maintained in creation of the resources.
================================================ FILE: topics/terraform/exercises/launch_ec2_instance/exercise.md ================================================ # Launch EC2 instance ## Requirements * AWS account ## Objectives 1. Write Terraform configuration for launching an EC2 instance 2. Run the commands to apply the configuration and create the EC2 instance 3. What happens if you run again `terraform apply`? 4. Destroy the instance you've created with Terraform ================================================ FILE: topics/terraform/exercises/launch_ec2_instance/solution.md ================================================ # Launch EC2 instance ## Requirements * AWS account ## Objectives 1. Write Terraform configuration for launching an EC2 instance 2. Run the commands to apply the configuration and create the EC2 instance 3. What happens if you run again `terraform apply`? 4. Destroy the instance you've created with Terraform ## Solution ``` mkdir exercise cat << EOT >> main.tf terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 4.16" } } required_version = ">= 1.2.0" } provider "aws" { region = "us-west-2" } resource "aws_instance" "app_server" { ami = "ami-830c94e3" instance_type = "t2.micro" tags = { Name = "ExampleAppServerInstance" } } EOT terraform init terraform validate terraform plan # You should see this line at the end: Plan: 1 to add, 0 to change, 0 to destroy terraform apply -auto-approve # You should see the following output: # aws_instance.app_server: Creation complete after 49s [id=i-004651a9d4427d236 # Running 'terraform apply' again won't change anything as # Terraform will compare actual infrastructure to your # configuration and won't find any difference. You should see the following line: # Apply complete! Resources: 0 added, 0 changed, 0 destroyed. # Remove instance terraform destroy -auto-approve # Destroy complete! Resources: 1 destroyed. ``` ================================================ FILE: topics/terraform/exercises/launch_ec2_web_instance/exercise.md ================================================ ================================================ FILE: topics/terraform/exercises/s3_bucket_rename/exercise.md ================================================ # Rename S3 Bucket ## Requirements * An existing S3 bucket tracked by Terraform. If you don't have it, you can use the following block and run `terraform apply`: ```terraform resource "aws_s3_bucket" "some_bucket" { bucket = "some-old-bucket" } ``` Attention: Since S3 buckets are globally unique, you will likely have to rename the bucket as someone else might have named it that way already. ## Objectives 1. Rename an existing S3 bucket and make sure it's still tracked by Terraform ## Solution Click [here to view the solution](solution.md) ================================================ FILE: topics/terraform/exercises/s3_bucket_rename/solution.md ================================================ # Rename S3 Bucket ## Requirements * An existing S3 bucket tracked by Terraform. If you don't have it, you can use the following block and run `terraform apply`: ```terraform resource "aws_s3_bucket" "some_bucket" { bucket = "some-old-bucket" } ``` Attention: Since S3 buckets are globally unique, you will likely have to rename the bucket as someone else might have named it that way already. ## Objectives 1. Rename an existing S3 bucket and make sure it's still tracked by Terraform ## Solution ```sh # A bucket name is immutable in AWS so we'll have to create a new bucket aws s3 mb s3://some-new-bucket-123 # Sync old bucket to new bucket aws s3 sync s3://some-old-bucket s3://some-new-bucket-123 # Option 1 (remove and import) ## Remove the old bucket from Terraform's state terraform state rm aws_s3_bucket.some_bucket ## Import new bucket to Terraform's state terraform import aws_s3_bucket.some_bucket some-new-bucket-123 : ' aws_s3_bucket.some_bucket: Refreshing state... [id=some-new-bucket-123] Import successful! The resources that were imported are shown above. These resources are now in your Terraform state and will henceforth be managed by Terraform. ' # Option 2 (move) ## Move the old bucket from Terraform's state to the new one terraform state mv aws_s3_bucket.some_bucket some-new-bucket-123 : ' Move "aws_s3_bucket.some_bucket" to "aws_s3_bucket.some-new-bucket-123" Successfully moved 1 object(s). ' # Modify Terraform file # Modify the Terraform definition to include the new name # resource "aws_s3_bucket" "some_bucket" { # bucket = "some-new-bucket-123" # } # Remove old bucket aws s3 rm s3://some-old-bucket --recursive aws s3 rb s3://some-old-bucket ``` ================================================ FILE: topics/terraform/exercises/terraform_local_provider/exercise.md ================================================ # Local Provider ## Objectives Learn how to use and run Terraform basic commands 1. Create a directory called "my_first_run" 2. Inside the directory create a file called "main.tf" with the following content ```terraform resource "local_file" "mario_local_file" { content = "It's a me, Mario!" filename = "/tmp/who_is_it.txt" } ``` 3. Run `terraform init`. What did it do? 4. Run `terraform plan`. What Terraform is going to perform? 5. Finally, run `terraform apply` and verify the file was created ## Solution Click [here to view the solution](solution.md) ================================================ FILE: topics/terraform/exercises/terraform_local_provider/solution.md ================================================ # Local Provider ## Objectives Learn how to use and run Terraform basic commands 1. Create a directory called "my_first_run" 2. Inside the directory create a file called "main.tf" with the following content ```terraform resource "local_file" "mario_local_file" { content = "It's a me, Mario!" filename = "/tmp/who_is_it.txt" } ``` 3. Run `terraform init`. What did it do? 4. Run `terraform plan`. What Terraform is going to perform? 5. Finally, run 'terraform apply' and verify the file was created ## Solution ```sh # Create a directory mkdir my_first_run && cd my_first_run # Create the file 'main.tf' cat << EOT >> main.tf resource "local_file" "mario_local_file" { content = "It's a me, Mario!" filename = "/tmp/who_is_it.txt" } EOT # Run 'terraform init' terraform init # Running 'ls -la' you'll it created '.terraform' and '.terraform.lock.hcl' # In addition, it initialized (downloaded and installed) the relevant provider plugins. In this case, the "hashicorp/local" # Run 'terraform plan' terraform plan # It shows what Terraform is going to perform once you'll run 'terraform apply' << terraform_plan_output Terraform will perform the following actions: # local_file.mario_local_file will be created + resource "local_file" "mario_local_file" { + content = "It's a me, Mario!" + directory_permission = "0777" + file_permission = "0777" + filename = "/tmp/who_is_it.txt" + id = (known after apply) } Plan: 1 to add, 0 to change, 0 to destroy. terraform_plan_output # Apply main.tf (it's better to run without -auto-approve if you are new to Terraform) terraform apply -auto-approve ls /tmp/who_is_it.txt # /tmp/who_is_it.txt ``` ================================================ FILE: topics/terraform/exercises/vpc_subnet_creation/exercise.md ================================================ # Creating Custom VPC and Subnets with Terraform ## Requirements * An existing AWS account with permissions to create VPCs and subnets. * Terraform installed on your local machine. * AWS CLI configured with your credentials. ## Objectives 1. Create a custom VPC with a specified CIDR block. For example, you can use `10.0.0.0/16`. 2. Create two subnets within the VPC, each with a different CIDR block. For example, you can use `10.0.0.0/20` for the first subnet and `10.0.16.0/20` for the second subnet. Both subnets should be in different availability zones to ensure high availability. 3. Ensure that the VPC and subnets are tracked by Terraform. ## Solution Click [here to view the solution](solution.md) ================================================ FILE: topics/terraform/exercises/vpc_subnet_creation/solution.md ================================================ # Creating Custom VPC and Subnets with Terraform ## Objectives 1. Create a custom VPC with a specified CIDR block. For example, you can use `10.0.0.0/16`. 2. Create two subnets within the VPC, each with a different CIDR block. For example, you can use `10.0.0.0/20` for the first subnet and `10.0.16.0/20` for the second subnet. Both subnets should be in different availability zones to ensure high availability. 3. Ensure that the VPC and subnets are tracked by Terraform. ## Solution ```sh # Create a directory for the Terraform configuration mkdir vpc_subnet_creation && cd vpc_subnet_creation ``` ```sh # Create the main.tf file with the VPC and subnets configuration touch main.tf ``` ```terraform terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = "your-region" # e.g., ap-south-1 } resource "aws_vpc" "my_custom_vpc" { cidr_block = "10.0.0.0/16" tags = { Name = "my_custom_vpc_made_with_terraform" } } resource "aws_subnet" "Subnet_A" { cidr_block = "10.0.0.0/20" vpc_id = aws_vpc.my_custom_vpc.id availability_zone = "your-availability-zone-a" # e.g., ap-south-1a tags = { "Name" = "Subnet A" } } resource "aws_subnet" "Subnet_B" { cidr_block = "10.0.16.0/20" vpc_id = aws_vpc.my_custom_vpc.id availability_zone = "your-availability-zone-b" # e.g., ap-south-1b tags = { "Name" = "Subnet B" } } ``` ```sh # Initialize Terraform to download the AWS provider terraform init ``` ```sh # Apply the Terraform configuration to create the VPC and subnets terraform apply -auto-approve ``` ================================================ FILE: topics/zuul/README.md ================================================ # Zuul ## Questions ### Basics
Describe shortly what is Zuul
From [Zuul's docs](https://zuul-ci.org/docs/zuul/about.html): "Zuul is a Project Gating System. That’s like a CI or CD system, but the focus is on testing the future state of code repositories... Zuul itself is a service which listens to events from various code review systems, executes jobs based on those events, and reports the results back to the code review system."
What is Nodepool and how is it related to Zuul?
"Nodepool is a system for managing test node resources. It supports launching single-use test nodes from cloud providers as well as managing access to pre-defined pre-existing nodes." "Zuul uses a separate component called Nodepool to provide the resources to run jobs. Nodepool works with several cloud providers as well as statically defined nodes (again, simultaneously)."
What is a Pipeline in Zuul?
A pipeline in Zuul is a workflow. This workflow can be executed based on different events - when a change is submitted to a project, when it's merged, etc.
The pipeline itself can be applied on one or more different projects (= repositories in hosted or private source control)
What is a project in Zuul?