Repository: huggingface/sam2-studio Branch: main Commit: 0dd708d0162e Files: 38 Total size: 107.3 KB Directory structure: gitextract_6ppky8xg/ ├── .gitignore ├── LICENSE ├── README.md ├── SAM2-Demo/ │ ├── Assets.xcassets/ │ │ ├── AccentColor.colorset/ │ │ │ └── Contents.json │ │ ├── AppIcon.appiconset/ │ │ │ └── Contents.json │ │ └── Contents.json │ ├── Common/ │ │ ├── CGImage+Extension.swift │ │ ├── CGImage+RawBytes.swift │ │ ├── Color+Extension.swift │ │ ├── CoreImageExtensions.swift │ │ ├── DirectoryDocument.swift │ │ ├── MLMultiArray+Image.swift │ │ ├── Models.swift │ │ ├── NSImage+Extension.swift │ │ └── SAM2.swift │ ├── ContentView.swift │ ├── Preview Content/ │ │ └── Preview Assets.xcassets/ │ │ └── Contents.json │ ├── Ripple/ │ │ ├── Ripple.metal │ │ ├── RippleModifier.swift │ │ └── RippleViewModifier.swift │ ├── SAM2_1SmallImageEncoderFLOAT16.mlpackage/ │ │ ├── Data/ │ │ │ └── com.apple.CoreML/ │ │ │ └── model.mlmodel │ │ └── Manifest.json │ ├── SAM2_1SmallMaskDecoderFLOAT16.mlpackage/ │ │ ├── Data/ │ │ │ └── com.apple.CoreML/ │ │ │ └── model.mlmodel │ │ └── Manifest.json │ ├── SAM2_1SmallPromptEncoderFLOAT16.mlpackage/ │ │ ├── Data/ │ │ │ └── com.apple.CoreML/ │ │ │ └── model.mlmodel │ │ └── Manifest.json │ ├── SAM2_Demo.entitlements │ ├── SAM2_DemoApp.swift │ └── Views/ │ ├── AnnotationListView.swift │ ├── ImageView.swift │ ├── LayerListView.swift │ ├── MaskEditor.swift │ ├── SubtoolbarView.swift │ └── ZoomableScrollView.swift ├── SAM2-Demo.xcodeproj/ │ ├── project.pbxproj │ └── project.xcworkspace/ │ ├── contents.xcworkspacedata │ └── xcshareddata/ │ └── swiftpm/ │ └── Package.resolved └── sam2-cli/ └── MainCommand.swift ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .DS_Store xcuserdata/ ================================================ FILE: LICENSE ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2022 Hugging Face SAS. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: README.md ================================================ # SAM2 Studio This is a Swift demo app for SAM 2 Core ML models. ![UI Screenshot](screenshot.png) SAM 2 (Segment Anything in Images and Videos), is a collection of foundation models from FAIR that aim to solve promptable visual segmentation in images and videos. See the [SAM 2 paper](https://arxiv.org/abs/2408.00714) for more information. ## Quick Start ⚡️ Download the compiled version [here!](https://huggingface.co/coreml-projects/sam-2-studio). ## How to Use If you prefer to compile it yourself or want to use a larger model, simply download the repo, compile with Xcode and run. The app comes with the Small version of the model, but you can replace it with one of the supported models: - [SAM 2.1 Tiny](https://huggingface.co/apple/coreml-sam2.1-tiny) - [SAM 2.1 Small](https://huggingface.co/apple/coreml-sam2.1-small) - [SAM 2.1 Base](https://huggingface.co/apple/coreml-sam2.1-baseplus) - [SAM 2.1 Large](https://huggingface.co/apple/coreml-sam2.1-large) For the older models, please check out the [Apple](https://huggingface.co/apple) organisation on HuggingFace. This demo supports images, video support will be coming later. ### Selecting Objects - You can select one or more _foreground_ points to choose objects in the image. Each additional point is interpreted as a _refinement_ of the previous mask. - Use a _background_ point to indicate an area to be removed from the current mask. - You can use a _box_ to select an approximate area that contains the object you're interested in. ## Converting Models If you want to use a fine-tuned model, you can convert it using [this fork of the SAM 2 repo](https://github.com/huggingface/segment-anything-2/tree/coreml-conversion). Please, let us know what you use it for! ## Feedback and Contributions Feedback, issues and PRs are welcome! Please, feel free to [get in touch](https://github.com/huggingface/sam2-swiftui/issues/new). ## Citation To cite the SAM 2 paper, model, or software, please use the below: ``` @article{ravi2024sam2, title={SAM 2: Segment Anything in Images and Videos}, author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph}, journal={arXiv preprint arXiv:2408.00714}, url={https://arxiv.org/abs/2408.00714}, year={2024} } ``` ================================================ FILE: SAM2-Demo/Assets.xcassets/AccentColor.colorset/Contents.json ================================================ { "colors" : [ { "idiom" : "universal" } ], "info" : { "author" : "xcode", "version" : 1 } } ================================================ FILE: SAM2-Demo/Assets.xcassets/AppIcon.appiconset/Contents.json ================================================ { "images" : [ { "idiom" : "mac", "scale" : "1x", "size" : "16x16" }, { "idiom" : "mac", "scale" : "2x", "size" : "16x16" }, { "idiom" : "mac", "scale" : "1x", "size" : "32x32" }, { "idiom" : "mac", "scale" : "2x", "size" : "32x32" }, { "idiom" : "mac", "scale" : "1x", "size" : "128x128" }, { "idiom" : "mac", "scale" : "2x", "size" : "128x128" }, { "idiom" : "mac", "scale" : "1x", "size" : "256x256" }, { "idiom" : "mac", "scale" : "2x", "size" : "256x256" }, { "idiom" : "mac", "scale" : "1x", "size" : "512x512" }, { "idiom" : "mac", "scale" : "2x", "size" : "512x512" } ], "info" : { "author" : "xcode", "version" : 1 } } ================================================ FILE: SAM2-Demo/Assets.xcassets/Contents.json ================================================ { "info" : { "author" : "xcode", "version" : 1 } } ================================================ FILE: SAM2-Demo/Common/CGImage+Extension.swift ================================================ // // CGImage+Extension.swift // SAM2-Demo // // Created by Cyril Zakka on 8/20/24. // import ImageIO extension CGImage { func resized(to size: CGSize) -> CGImage? { let width: Int = Int(size.width) let height: Int = Int(size.height) let bytesPerPixel = self.bitsPerPixel / 8 let destBytesPerRow = width * bytesPerPixel guard let colorSpace = self.colorSpace else { return nil } guard let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: self.bitsPerComponent, bytesPerRow: destBytesPerRow, space: colorSpace, bitmapInfo: self.bitmapInfo.rawValue) else { return nil } context.interpolationQuality = .high context.draw(self, in: CGRect(x: 0, y: 0, width: width, height: height)) return context.makeImage() } } ================================================ FILE: SAM2-Demo/Common/CGImage+RawBytes.swift ================================================ /* Copyright (c) 2017-2019 M.I. Hollemans Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ import CoreGraphics extension CGImage { /** Converts the image into an array of RGBA bytes. */ @nonobjc public func toByteArrayRGBA() -> [UInt8] { var bytes = [UInt8](repeating: 0, count: width * height * 4) bytes.withUnsafeMutableBytes { ptr in if let colorSpace = colorSpace, let context = CGContext( data: ptr.baseAddress, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue) { let rect = CGRect(x: 0, y: 0, width: width, height: height) context.draw(self, in: rect) } } return bytes } /** Creates a new CGImage from an array of RGBA bytes. */ @nonobjc public class func fromByteArrayRGBA(_ bytes: [UInt8], width: Int, height: Int) -> CGImage? { return fromByteArray(bytes, width: width, height: height, bytesPerRow: width * 4, colorSpace: CGColorSpaceCreateDeviceRGB(), alphaInfo: .premultipliedLast) } /** Creates a new CGImage from an array of grayscale bytes. */ @nonobjc public class func fromByteArrayGray(_ bytes: [UInt8], width: Int, height: Int) -> CGImage? { return fromByteArray(bytes, width: width, height: height, bytesPerRow: width, colorSpace: CGColorSpaceCreateDeviceGray(), alphaInfo: .none) } @nonobjc class func fromByteArray(_ bytes: [UInt8], width: Int, height: Int, bytesPerRow: Int, colorSpace: CGColorSpace, alphaInfo: CGImageAlphaInfo) -> CGImage? { return bytes.withUnsafeBytes { ptr in let context = CGContext(data: UnsafeMutableRawPointer(mutating: ptr.baseAddress!), width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: alphaInfo.rawValue) return context?.makeImage() } } } ================================================ FILE: SAM2-Demo/Common/Color+Extension.swift ================================================ // // Color+Extension.swift // SAM2-Demo // // Created by Fleetwood on 01/10/2024. // import SwiftUI #if canImport(UIKit) import UIKit #elseif canImport(AppKit) import AppKit #endif extension Color { #if canImport(UIKit) var asNative: UIColor { UIColor(self) } #elseif canImport(AppKit) var asNative: NSColor { NSColor(self) } #endif var rgba: (red: CGFloat, green: CGFloat, blue: CGFloat, alpha: CGFloat) { let color = asNative.usingColorSpace(.deviceRGB)! var t = (CGFloat(), CGFloat(), CGFloat(), CGFloat()) color.getRed(&t.0, green: &t.1, blue: &t.2, alpha: &t.3) return t } } func colorDistance(_ color1: Color, _ color2: Color) -> Double { let rgb1 = color1.rgba; let rgb2 = color2.rgba; let rDiff = rgb1.red - rgb2.red let gDiff = rgb1.green - rgb2.green let bDiff = rgb1.blue - rgb2.blue return sqrt(rDiff*rDiff + gDiff*gDiff + bDiff*bDiff) } // Determine the Euclidean distance of all candidates from current set of colors. // Find the **maximum min-distance** from all current colors. func furthestColor(from existingColors: [Color], among candidateColors: [Color]) -> Color { var maxMinDistance: Double = 0 var furthestColor: Color = SAMSegmentation.randomCandidateColor() ?? SAMSegmentation.defaultColor for candidate in candidateColors { let minDistance = existingColors.map { colorDistance(candidate, $0) }.min() ?? 0 if minDistance > maxMinDistance { maxMinDistance = minDistance furthestColor = candidate } } return furthestColor } ================================================ FILE: SAM2-Demo/Common/CoreImageExtensions.swift ================================================ import CoreImage import CoreImage.CIFilterBuiltins import ImageIO import UniformTypeIdentifiers extension CIImage { /// Returns a resized image. func resized(to size: CGSize) -> CIImage { let outputScaleX = size.width / extent.width let outputScaleY = size.height / extent.height var outputImage = self.transformed(by: CGAffineTransform(scaleX: outputScaleX, y: outputScaleY)) outputImage = outputImage.transformed( by: CGAffineTransform(translationX: -outputImage.extent.origin.x, y: -outputImage.extent.origin.y) ) return outputImage } public func withAlpha(_ alpha: T) -> CIImage? { guard alpha != 1 else { return self } let filter = CIFilter.colorMatrix() filter.inputImage = self filter.aVector = CIVector(x: 0, y: 0, z: 0, w: CGFloat(alpha)) return filter.outputImage } public func applyingThreshold(_ threshold: Float) -> CIImage? { let filter = CIFilter.colorThreshold() filter.inputImage = self filter.threshold = threshold return filter.outputImage } } extension CIContext { /// Renders an image to a new pixel buffer. func render(_ image: CIImage, pixelFormat: OSType) -> CVPixelBuffer? { var output: CVPixelBuffer! let status = CVPixelBufferCreate( kCFAllocatorDefault, Int(image.extent.width), Int(image.extent.height), pixelFormat, nil, &output ) guard status == kCVReturnSuccess else { return nil } render(image, to: output, bounds: image.extent, colorSpace: nil) return output } /// Writes the image as a PNG. func writePNG(_ image: CIImage, to url: URL) { let outputCGImage = createCGImage(image, from: image.extent, format: .BGRA8, colorSpace: nil)! guard let destination = CGImageDestinationCreateWithURL(url as CFURL, UTType.png.identifier as CFString, 1, nil) else { fatalError("Failed to create an image destination.") } CGImageDestinationAddImage(destination, outputCGImage, nil) CGImageDestinationFinalize(destination) } } ================================================ FILE: SAM2-Demo/Common/DirectoryDocument.swift ================================================ // // DirectoryDocument.swift // SAM2-Demo // // Created by Cyril Zakka on 9/10/24. // import SwiftUI import UniformTypeIdentifiers struct DirectoryDocument: FileDocument { static var readableContentTypes: [UTType] { [.folder] } init(initialContentType: UTType = .folder) { // Initialize if needed } init(configuration: ReadConfiguration) throws { // Initialize if needed } func fileWrapper(configuration: WriteConfiguration) throws -> FileWrapper { return FileWrapper(directoryWithFileWrappers: [:]) } } ================================================ FILE: SAM2-Demo/Common/MLMultiArray+Image.swift ================================================ /* Copyright (c) 2017-2020 M.I. Hollemans Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ import Accelerate import CoreML public func clamp(_ x: T, min: T, max: T) -> T { if x < min { return min } if x > max { return max } return x } public protocol MultiArrayType: Comparable { static var multiArrayDataType: MLMultiArrayDataType { get } static func +(lhs: Self, rhs: Self) -> Self static func -(lhs: Self, rhs: Self) -> Self static func *(lhs: Self, rhs: Self) -> Self static func /(lhs: Self, rhs: Self) -> Self init(_: Int) var toUInt8: UInt8 { get } } extension Double: MultiArrayType { public static var multiArrayDataType: MLMultiArrayDataType { return .double } public var toUInt8: UInt8 { return UInt8(self) } } extension Float: MultiArrayType { public static var multiArrayDataType: MLMultiArrayDataType { return .float32 } public var toUInt8: UInt8 { return UInt8(self) } } extension Int32: MultiArrayType { public static var multiArrayDataType: MLMultiArrayDataType { return .int32 } public var toUInt8: UInt8 { return UInt8(self) } } extension Float16: MultiArrayType { public static var multiArrayDataType: MLMultiArrayDataType { return .float16 } public var toUInt8: UInt8 { return UInt8(self) } } extension MLMultiArray { /** Converts the multi-array to a CGImage. The multi-array must have at least 2 dimensions for a grayscale image, or at least 3 dimensions for a color image. The default expected shape is (height, width) or (channels, height, width). However, you can change this using the `axes` parameter. For example, if the array shape is (1, height, width, channels), use `axes: (3, 1, 2)`. If `channel` is not nil, only converts that channel to a grayscale image. This lets you visualize individual channels from a multi-array with more than 4 channels. Otherwise, converts all channels. In this case, the number of channels in the multi-array must be 1 for grayscale, 3 for RGB, or 4 for RGBA. Use the `min` and `max` parameters to put the values from the array into the range [0, 255], if not already: - `min`: should be the smallest value in the data; this will be mapped to 0. - `max`: should be the largest value in the data; will be mapped to 255. For example, if the range of the data in the multi-array is [-1, 1], use `min: -1, max: 1`. If the range is already [0, 255], then use the defaults. */ public func cgImage(min: Double = 0, max: Double = 255, channel: Int? = nil, axes: (Int, Int, Int)? = nil) -> CGImage? { switch self.dataType { case .double: return _image(min: min, max: max, channel: channel, axes: axes) case .float32: return _image(min: Float(min), max: Float(max), channel: channel, axes: axes) case .int32: return _image(min: Int32(min), max: Int32(max), channel: channel, axes: axes) case .float16: return _image(min: Float16(min), max: Float16(max), channel: channel, axes: axes) @unknown default: fatalError("Unsupported data type \(dataType.rawValue)") } } /** Helper function that allows us to use generics. The type of `min` and `max` is also the dataType of the MLMultiArray. */ private func _image(min: T, max: T, channel: Int?, axes: (Int, Int, Int)?) -> CGImage? { if let (b, w, h, c) = toRawBytes(min: min, max: max, channel: channel, axes: axes) { if c == 1 { return CGImage.fromByteArrayGray(b, width: w, height: h) } else { return CGImage.fromByteArrayRGBA(b, width: w, height: h) } } return nil } /** Converts the multi-array into an array of RGBA or grayscale pixels. - Note: This is not particularly fast, but it is flexible. You can change the loops to convert the multi-array whichever way you please. - Note: The type of `min` and `max` must match the dataType of the MLMultiArray object. - Returns: tuple containing the RGBA bytes, the dimensions of the image, and the number of channels in the image (1, 3, or 4). */ public func toRawBytes(min: T, max: T, channel: Int? = nil, axes: (Int, Int, Int)? = nil) -> (bytes: [UInt8], width: Int, height: Int, channels: Int)? { // MLMultiArray with unsupported shape? if shape.count < 2 { print("Cannot convert MLMultiArray of shape \(shape) to image") return nil } // Figure out which dimensions to use for the channels, height, and width. let channelAxis: Int let heightAxis: Int let widthAxis: Int if let axes = axes { channelAxis = axes.0 heightAxis = axes.1 widthAxis = axes.2 guard channelAxis >= 0 && channelAxis < shape.count && heightAxis >= 0 && heightAxis < shape.count && widthAxis >= 0 && widthAxis < shape.count else { print("Invalid axes \(axes) for shape \(shape)") return nil } } else if shape.count == 2 { // Expected shape for grayscale is (height, width) heightAxis = 0 widthAxis = 1 channelAxis = -1 // Never be used } else { // Expected shape for color is (channels, height, width) channelAxis = 0 heightAxis = 1 widthAxis = 2 } let height = self.shape[heightAxis].intValue let width = self.shape[widthAxis].intValue let yStride = self.strides[heightAxis].intValue let xStride = self.strides[widthAxis].intValue let channels: Int let cStride: Int let bytesPerPixel: Int let channelOffset: Int // MLMultiArray with just two dimensions is always grayscale. (We ignore // the value of channelAxis here.) if shape.count == 2 { channels = 1 cStride = 0 bytesPerPixel = 1 channelOffset = 0 // MLMultiArray with more than two dimensions can be color or grayscale. } else { let channelDim = self.shape[channelAxis].intValue if let channel = channel { if channel < 0 || channel >= channelDim { print("Channel must be -1, or between 0 and \(channelDim - 1)") return nil } channels = 1 bytesPerPixel = 1 channelOffset = channel } else if channelDim == 1 { channels = 1 bytesPerPixel = 1 channelOffset = 0 } else { if channelDim != 3 && channelDim != 4 { print("Expected channel dimension to have 1, 3, or 4 channels, got \(channelDim)") return nil } channels = channelDim bytesPerPixel = 4 channelOffset = 0 } cStride = self.strides[channelAxis].intValue } // Allocate storage for the RGBA or grayscale pixels. Set everything to // 255 so that alpha channel is filled in if only 3 channels. let count = height * width * bytesPerPixel var pixels = [UInt8](repeating: 255, count: count) // Grab the pointer to MLMultiArray's memory. var ptr = UnsafeMutablePointer(OpaquePointer(self.dataPointer)) ptr = ptr.advanced(by: channelOffset * cStride) // Loop through all the pixels and all the channels and copy them over. for c in 0.. CGImage? { assert(features.dataType == .float32) assert(features.shape.count == 3) let ptr = UnsafeMutablePointer(OpaquePointer(features.dataPointer)) let height = features.shape[1].intValue let width = features.shape[2].intValue let channelStride = features.strides[0].intValue let rowStride = features.strides[1].intValue let srcRowBytes = rowStride * MemoryLayout.stride var blueBuffer = vImage_Buffer(data: ptr, height: vImagePixelCount(height), width: vImagePixelCount(width), rowBytes: srcRowBytes) var greenBuffer = vImage_Buffer(data: ptr.advanced(by: channelStride), height: vImagePixelCount(height), width: vImagePixelCount(width), rowBytes: srcRowBytes) var redBuffer = vImage_Buffer(data: ptr.advanced(by: channelStride * 2), height: vImagePixelCount(height), width: vImagePixelCount(width), rowBytes: srcRowBytes) let destRowBytes = width * 4 var error: vImage_Error = 0 var pixels = [UInt8](repeating: 0, count: height * destRowBytes) pixels.withUnsafeMutableBufferPointer { ptr in var destBuffer = vImage_Buffer(data: ptr.baseAddress!, height: vImagePixelCount(height), width: vImagePixelCount(width), rowBytes: destRowBytes) error = vImageConvert_PlanarFToBGRX8888(&blueBuffer, &greenBuffer, &redBuffer, Pixel_8(255), &destBuffer, [max, max, max], [min, min, min], vImage_Flags(0)) } if error == kvImageNoError { return CGImage.fromByteArrayRGBA(pixels, width: width, height: height) } else { return nil } } ================================================ FILE: SAM2-Demo/Common/Models.swift ================================================ // // Models.swift // SAM2-Demo // // Created by Cyril Zakka on 8/19/24. // import Foundation import SwiftUI enum SAMCategoryType: Int { case background = 0 case foreground = 1 case boxOrigin = 2 case boxEnd = 3 var description: String { switch self { case .foreground: return "Foreground" case .background: return "Background" case .boxOrigin: return "Box Origin" case .boxEnd: return "Box End" } } } struct SAMCategory: Hashable { let id: UUID = UUID() let type: SAMCategoryType let name: String let iconName: String let color: Color var typeDescription: String { type.description } static let foreground = SAMCategory( type: .foreground, name: "Foreground", iconName: "square.on.square.dashed", color: .pink ) static let background = SAMCategory( type: .background, name: "Background", iconName: "square.on.square.intersection.dashed", color: .purple ) static let boxOrigin = SAMCategory( type: .boxOrigin, name: "Box Origin", iconName: "", color: .white ) static let boxEnd = SAMCategory( type: .boxEnd, name: "Box End", iconName: "", color: .white ) } struct SAMPoint: Hashable { let id = UUID() let coordinates: CGPoint let category: SAMCategory let dateAdded = Date() } struct SAMBox: Hashable, Identifiable { let id = UUID() var startPoint: CGPoint var endPoint: CGPoint let category: SAMCategory let dateAdded = Date() var midpoint: CGPoint { return CGPoint( x: (startPoint.x + endPoint.x) / 2, y: (startPoint.y + endPoint.y) / 2 ) } } extension SAMBox { var points: [SAMPoint] { [SAMPoint(coordinates: startPoint, category: .boxOrigin), SAMPoint(coordinates: endPoint, category: .boxEnd)] } } struct SAMSegmentation: Hashable, Identifiable { let id = UUID() var image: CIImage var tintColor: Color { didSet { updateTintedImage() } } var title: String = "" var firstAppearance: Int? var isHidden: Bool = false private var tintedImage: CIImage? static let defaultColor: Color = Color(.sRGB, red: 30/255, green: 144/255, blue: 1) static let candidateColors: [Color] = [ defaultColor, Color.red, Color.green, Color.brown, Color.indigo, Color.cyan, Color.yellow, Color.purple, Color.orange, Color.teal, Color.indigo, Color.mint, Color.pink, ] init(image: CIImage, tintColor: Color = Color(.sRGB, red: 30/255, green: 144/255, blue: 1), title: String = "", firstAppearance: Int? = nil, isHidden: Bool = false) { self.image = image self.tintColor = tintColor self.title = title self.firstAppearance = firstAppearance self.isHidden = isHidden updateTintedImage() } private mutating func updateTintedImage() { let ciColor = CIColor(color: NSColor(tintColor)) let monochromeFilter = CIFilter.colorMonochrome() monochromeFilter.inputImage = image monochromeFilter.color = ciColor! monochromeFilter.intensity = 1.0 tintedImage = monochromeFilter.outputImage } static func randomCandidateColor() -> Color? { Self.candidateColors.randomElement() } var cgImage: CGImage { let context = CIContext() return context.createCGImage(tintedImage ?? image, from: (tintedImage ?? image).extent)! } } struct SAMTool: Hashable { let id: UUID = UUID() let name: String let iconName: String } // Tools let pointTool: SAMTool = SAMTool(name: "Point", iconName: "hand.point.up.left") let boundingBoxTool: SAMTool = SAMTool(name: "Bounding Box", iconName: "rectangle.dashed") ================================================ FILE: SAM2-Demo/Common/NSImage+Extension.swift ================================================ // // NSImage+Extension.swift // SAM2-Demo // // Created by Cyril Zakka on 8/20/24. // import AppKit import VideoToolbox extension NSImage { /** Converts the image to an ARGB `CVPixelBuffer`. */ public func pixelBuffer() -> CVPixelBuffer? { return pixelBuffer(width: Int(size.width), height: Int(size.height)) } /** Resizes the image to `width` x `height` and converts it to an ARGB `CVPixelBuffer`. */ public func pixelBuffer(width: Int, height: Int) -> CVPixelBuffer? { return pixelBuffer(width: width, height: height, pixelFormatType: kCVPixelFormatType_32ARGB, colorSpace: CGColorSpaceCreateDeviceRGB(), alphaInfo: .noneSkipFirst) } /** Resizes the image to `width` x `height` and converts it to a `CVPixelBuffer` with the specified pixel format, color space, and alpha channel. */ public func pixelBuffer(width: Int, height: Int, pixelFormatType: OSType, colorSpace: CGColorSpace, alphaInfo: CGImageAlphaInfo) -> CVPixelBuffer? { var maybePixelBuffer: CVPixelBuffer? let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] let status = CVPixelBufferCreate(kCFAllocatorDefault, width, height, pixelFormatType, attrs as CFDictionary, &maybePixelBuffer) guard status == kCVReturnSuccess, let pixelBuffer = maybePixelBuffer else { return nil } let flags = CVPixelBufferLockFlags(rawValue: 0) guard kCVReturnSuccess == CVPixelBufferLockBaseAddress(pixelBuffer, flags) else { return nil } defer { CVPixelBufferUnlockBaseAddress(pixelBuffer, flags) } guard let context = CGContext(data: CVPixelBufferGetBaseAddress(pixelBuffer), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer), space: colorSpace, bitmapInfo: alphaInfo.rawValue) else { return nil } NSGraphicsContext.saveGraphicsState() let nscg = NSGraphicsContext(cgContext: context, flipped: true) NSGraphicsContext.current = nscg context.translateBy(x: 0, y: CGFloat(height)) context.scaleBy(x: 1, y: -1) self.draw(in: CGRect(x: 0, y: 0, width: width, height: height)) NSGraphicsContext.restoreGraphicsState() return pixelBuffer } } ================================================ FILE: SAM2-Demo/Common/SAM2.swift ================================================ // // SAM2.swift // SAM2-Demo // // Created by Cyril Zakka on 8/20/24. // import SwiftUI import CoreML import CoreImage import CoreImage.CIFilterBuiltins import Combine import UniformTypeIdentifiers @MainActor class SAM2: ObservableObject { @Published var imageEncodings: SAM2_1SmallImageEncoderFLOAT16Output? @Published var promptEncodings: SAM2_1SmallPromptEncoderFLOAT16Output? @Published private(set) var initializationTime: TimeInterval? @Published private(set) var initialized: Bool? private var imageEncoderModel: SAM2_1SmallImageEncoderFLOAT16? private var promptEncoderModel: SAM2_1SmallPromptEncoderFLOAT16? private var maskDecoderModel: SAM2_1SmallMaskDecoderFLOAT16? // TODO: examine model inputs instead var inputSize: CGSize { CGSize(width: 1024, height: 1024) } var width: CGFloat { inputSize.width } var height: CGFloat { inputSize.height } init() { Task { await loadModels() } } private func loadModels() async { let startTime = CFAbsoluteTimeGetCurrent() do { let configuration = MLModelConfiguration() configuration.computeUnits = .cpuAndGPU let (imageEncoder, promptEncoder, maskDecoder) = try await Task.detached(priority: .userInitiated) { let imageEncoder = try SAM2_1SmallImageEncoderFLOAT16(configuration: configuration) let promptEncoder = try SAM2_1SmallPromptEncoderFLOAT16(configuration: configuration) let maskDecoder = try SAM2_1SmallMaskDecoderFLOAT16(configuration: configuration) return (imageEncoder, promptEncoder, maskDecoder) }.value let endTime = CFAbsoluteTimeGetCurrent() self.initializationTime = endTime - startTime self.initialized = true self.imageEncoderModel = imageEncoder self.promptEncoderModel = promptEncoder self.maskDecoderModel = maskDecoder print("Initialized models in \(String(format: "%.4f", self.initializationTime!)) seconds") } catch { print("Failed to initialize models: \(error)") self.initializationTime = nil self.initialized = false } } // Convenience for use in the CLI private var modelLoading: AnyCancellable? func ensureModelsAreLoaded() async throws -> SAM2 { let _ = try await withCheckedThrowingContinuation { continuation in modelLoading = self.$initialized.sink { newValue in if let initialized = newValue { if initialized { continuation.resume(returning: self) } else { continuation.resume(throwing: SAM2Error.modelNotLoaded) } } } } return self } static func load() async throws -> SAM2 { try await SAM2().ensureModelsAreLoaded() } func getImageEncoding(from pixelBuffer: CVPixelBuffer) async throws { guard let model = imageEncoderModel else { throw SAM2Error.modelNotLoaded } let encoding = try model.prediction(image: pixelBuffer) self.imageEncodings = encoding } func getImageEncoding(from url: URL) async throws { guard let model = imageEncoderModel else { throw SAM2Error.modelNotLoaded } let inputs = try SAM2_1SmallImageEncoderFLOAT16Input(imageAt: url) let encoding = try await model.prediction(input: inputs) self.imageEncodings = encoding } func getPromptEncoding(from allPoints: [SAMPoint], with size: CGSize) async throws { guard let model = promptEncoderModel else { throw SAM2Error.modelNotLoaded } let transformedCoords = try transformCoords(allPoints.map { $0.coordinates }, normalize: false, origHW: size) // Create MLFeatureProvider with the required input format let pointsMultiArray = try MLMultiArray(shape: [1, NSNumber(value: allPoints.count), 2], dataType: .float32) let labelsMultiArray = try MLMultiArray(shape: [1, NSNumber(value: allPoints.count)], dataType: .int32) for (index, point) in transformedCoords.enumerated() { pointsMultiArray[[0, index, 0] as [NSNumber]] = NSNumber(value: Float(point.x)) pointsMultiArray[[0, index, 1] as [NSNumber]] = NSNumber(value: Float(point.y)) labelsMultiArray[[0, index] as [NSNumber]] = NSNumber(value: allPoints[index].category.type.rawValue) } let encoding = try model.prediction(points: pointsMultiArray, labels: labelsMultiArray) self.promptEncodings = encoding } func bestMask(for output: SAM2_1SmallMaskDecoderFLOAT16Output) -> MLMultiArray { if #available(macOS 15.0, *) { let scores = output.scoresShapedArray.scalars let argmax = scores.firstIndex(of: scores.max() ?? 0) ?? 0 return MLMultiArray(output.low_res_masksShapedArray[0, argmax]) } else { // Convert scores to float32 for compatibility with macOS < 15, // plus ugly loop copy (could do some memcpys) let scores = output.scores let floatScores = (0.. CIImage? { guard let model = maskDecoderModel else { throw SAM2Error.modelNotLoaded } if let image_embedding = self.imageEncodings?.image_embedding, let feats0 = self.imageEncodings?.feats_s0, let feats1 = self.imageEncodings?.feats_s1, let sparse_embedding = self.promptEncodings?.sparse_embeddings, let dense_embedding = self.promptEncodings?.dense_embeddings { let output = try model.prediction(image_embedding: image_embedding, sparse_embedding: sparse_embedding, dense_embedding: dense_embedding, feats_s0: feats0, feats_s1: feats1) // Extract best mask and ignore the others let lowFeatureMask = bestMask(for: output) // TODO: optimization // Preserve range for upsampling var minValue: Double = 9999 var maxValue: Double = -9999 for i in 0.. maxValue { maxValue = v } if v < minValue { minValue = v } } let threshold = -minValue / (maxValue - minValue) // Resize first, then threshold if let maskcgImage = lowFeatureMask.cgImage(min: minValue, max: maxValue) { let ciImage = CIImage(cgImage: maskcgImage, options: [.colorSpace: NSNull()]) let resizedImage = try resizeImage(ciImage, to: original_size, applyingThreshold: Float(threshold)) return resizedImage?.maskedToAlpha()?.samTinted() } } return nil } private func transformCoords(_ coords: [CGPoint], normalize: Bool = false, origHW: CGSize) throws -> [CGPoint] { guard normalize else { return coords.map { CGPoint(x: $0.x * width, y: $0.y * height) } } let w = origHW.width let h = origHW.height return coords.map { coord in let normalizedX = coord.x / w let normalizedY = coord.y / h return CGPoint(x: normalizedX * width, y: normalizedY * height) } } private func resizeImage(_ image: CIImage, to size: CGSize, applyingThreshold threshold: Float = 1) throws -> CIImage? { let scale = CGAffineTransform(scaleX: size.width / image.extent.width, y: size.height / image.extent.height) return image.transformed(by: scale).applyingThreshold(threshold) } } extension CIImage { /// This is only appropriate for grayscale mask images (our case). CIColorMatrix can be used more generally. func maskedToAlpha() -> CIImage? { let filter = CIFilter.maskToAlpha() filter.inputImage = self return filter.outputImage } func samTinted() -> CIImage? { let filter = CIFilter.colorMatrix() filter.rVector = CIVector(x: 30/255, y: 0, z: 0, w: 1) filter.gVector = CIVector(x: 0, y: 144/255, z: 0, w: 1) filter.bVector = CIVector(x: 0, y: 0, z: 1, w: 1) filter.biasVector = CIVector(x: -1, y: -1, z: -1, w: 0) filter.inputImage = self return filter.outputImage?.cropped(to: self.extent) } } enum SAM2Error: Error { case modelNotLoaded case pixelBufferCreationFailed case imageResizingFailed } @discardableResult func writeCGImage(_ image: CGImage, to destinationURL: URL) -> Bool { guard let destination = CGImageDestinationCreateWithURL(destinationURL as CFURL, UTType.png.identifier as CFString, 1, nil) else { return false } CGImageDestinationAddImage(destination, image, nil) return CGImageDestinationFinalize(destination) } ================================================ FILE: SAM2-Demo/ContentView.swift ================================================ import SwiftUI import PhotosUI import UniformTypeIdentifiers import CoreML import os // TODO: Add reset, bounding box, and eraser let logger = Logger( subsystem: "com.cyrilzakka.SAM2-Demo.ContentView", category: "ContentView") struct PointsOverlay: View { @Binding var selectedPoints: [SAMPoint] @Binding var selectedTool: SAMTool? let imageSize: CGSize var body: some View { ForEach(selectedPoints, id: \.self) { point in Circle() .frame(width: 10, height: 10) .foregroundStyle(point.category.color) .position(point.coordinates.toSize(imageSize)) } } } struct BoundingBoxesOverlay: View { let boundingBoxes: [SAMBox] let currentBox: SAMBox? let imageSize: CGSize var body: some View { ForEach(boundingBoxes) { box in BoundingBoxPath(box: box, imageSize: imageSize) } if let currentBox = currentBox { BoundingBoxPath(box: currentBox, imageSize: imageSize) } } } struct BoundingBoxPath: View { let box: SAMBox let imageSize: CGSize var body: some View { Path { path in path.move(to: box.startPoint.toSize(imageSize)) path.addLine(to: CGPoint(x: box.endPoint.x, y: box.startPoint.y).toSize(imageSize)) path.addLine(to: box.endPoint.toSize(imageSize)) path.addLine(to: CGPoint(x: box.startPoint.x, y: box.endPoint.y).toSize(imageSize)) path.closeSubpath() } .stroke( box.category.color, style: StrokeStyle(lineWidth: 2, dash: [5, 5]) ) } } struct SegmentationOverlay: View { @Binding var segmentationImage: SAMSegmentation let imageSize: CGSize @State var counter: Int = 0 var origin: CGPoint = .zero var shouldAnimate: Bool = false var body: some View { let nsImage = NSImage(cgImage: segmentationImage.cgImage, size: imageSize) Image(nsImage: nsImage) .resizable() .scaledToFit() .allowsHitTesting(false) .frame(width: imageSize.width, height: imageSize.height) .opacity(segmentationImage.isHidden ? 0:0.6) .modifier(RippleEffect(at: CGPoint(x: segmentationImage.cgImage.width/2, y: segmentationImage.cgImage.height/2), trigger: counter)) .onAppear { if shouldAnimate { counter += 1 } } } } struct ContentView: View { // ML Models @StateObject private var sam2 = SAM2() @State private var currentSegmentation: SAMSegmentation? @State private var segmentationImages: [SAMSegmentation] = [] @State private var imageSize: CGSize = .zero // File importer @State private var imageURL: URL? @State private var isImportingFromFiles: Bool = false @State private var displayImage: NSImage? // Mask exporter @State private var exportURL: URL? @State private var exportMaskToPNG: Bool = false @State private var showInspector: Bool = true @State private var selectedSegmentations = Set() // Photos Picker @State private var isImportingFromPhotos: Bool = false @State private var selectedItem: PhotosPickerItem? @State private var error: Error? // ML Model Properties var tools: [SAMTool] = [pointTool, boundingBoxTool] var categories: [SAMCategory] = [.foreground, .background] @State private var selectedTool: SAMTool? @State private var selectedCategory: SAMCategory? @State private var selectedPoints: [SAMPoint] = [] @State private var boundingBoxes: [SAMBox] = [] @State private var currentBox: SAMBox? @State private var originalSize: NSSize? @State private var currentScale: CGFloat = 1.0 @State private var visibleRect: CGRect = .zero var body: some View { NavigationSplitView(sidebar: { VStack { LayerListView(segmentationImages: $segmentationImages, selectedSegmentations: $selectedSegmentations, currentSegmentation: $currentSegmentation) Spacer() Button(action: { if let currentSegmentation = self.currentSegmentation { self.segmentationImages.append(currentSegmentation) self.reset() } }, label: { Text("New Mask") }).padding() } }, detail: { ZStack { ZoomableScrollView(visibleRect: $visibleRect) { if let image = displayImage { ImageView(image: image, currentScale: $currentScale, selectedTool: $selectedTool, selectedCategory: $selectedCategory, selectedPoints: $selectedPoints, boundingBoxes: $boundingBoxes, currentBox: $currentBox, segmentationImages: $segmentationImages, currentSegmentation: $currentSegmentation, imageSize: $imageSize, originalSize: $originalSize, sam2: sam2) } else { ContentUnavailableView("No Image Loaded", systemImage: "photo.fill.on.rectangle.fill", description: Text("Please import a photo to get started.")) } } VStack(spacing: 0) { SubToolbar(selectedPoints: $selectedPoints, boundingBoxes: $boundingBoxes, segmentationImages: $segmentationImages, currentSegmentation: $currentSegmentation) Spacer() } } }) .inspector(isPresented: $showInspector, content: { if selectedSegmentations.isEmpty { ContentUnavailableView(label: { Label(title: { Text("No Mask Selected") .font(.subheadline) }, icon: {}) }) .inspectorColumnWidth(min: 200, ideal: 200, max: 200) } else { MaskEditor(exportMaskToPNG: $exportMaskToPNG, segmentationImages: $segmentationImages, selectedSegmentations: $selectedSegmentations, currentSegmentation: $currentSegmentation) .inspectorColumnWidth(min: 200, ideal: 200, max: 200) .toolbar { Spacer() Button { showInspector.toggle() } label: { Label("Toggle Inspector", systemImage: "sidebar.trailing") } } } }) .toolbar { // Tools ToolbarItemGroup(placement: .principal) { Picker(selection: $selectedTool, content: { ForEach(tools, id: \.self) { tool in Label(tool.name, systemImage: tool.iconName) .tag(tool) .labelStyle(.titleAndIcon) } }, label: { Label("Tools", systemImage: "pencil.and.ruler") }) .pickerStyle(.menu) Picker(selection: $selectedCategory, content: { ForEach(categories, id: \.self) { cat in Label(cat.name, systemImage: cat.iconName) .tag(cat) .labelStyle(.titleAndIcon) } }, label: { Label("Tools", systemImage: "pencil.and.ruler") }) .pickerStyle(.menu) } // Import ToolbarItemGroup { Menu { Button(action: { isImportingFromPhotos = true }, label: { Label("From Photos", systemImage: "photo.on.rectangle.angled.fill") }) Button(action: { isImportingFromFiles = true }, label: { Label("From Files", systemImage: "folder.fill") }) } label: { Label("Import", systemImage: "photo.badge.plus") } } } .onAppear { if selectedTool == nil { selectedTool = tools[0] } if selectedCategory == nil { selectedCategory = categories.first } } // MARK: - Image encoding .onChange(of: displayImage) { segmentationImages = [] self.reset() Task { if let displayImage, let pixelBuffer = displayImage.pixelBuffer(width: 1024, height: 1024) { originalSize = displayImage.size do { try await sam2.getImageEncoding(from: pixelBuffer) } catch { self.error = error } } } } // MARK: - Photos Importer .photosPicker(isPresented: $isImportingFromPhotos, selection: $selectedItem, matching: .any(of: [.images, .screenshots, .livePhotos])) .onChange(of: selectedItem) { Task { if let loadedData = try? await selectedItem?.loadTransferable(type: Data.self) { DispatchQueue.main.async { selectedPoints.removeAll() displayImage = NSImage(data: loadedData) } } else { logger.error("Error loading image from Photos.") } } } // MARK: - File Importer .fileImporter(isPresented: $isImportingFromFiles, allowedContentTypes: [.image]) { result in switch result { case .success(let file): self.selectedItem = nil self.selectedPoints.removeAll() self.imageURL = file loadImage(from: file) case .failure(let error): logger.error("File import error: \(error.localizedDescription)") self.error = error } } // MARK: - File exporter .fileExporter( isPresented: $exportMaskToPNG, document: DirectoryDocument(initialContentType: .folder), contentType: .folder, defaultFilename: "Segmentations" ) { result in if case .success(let url) = result { exportURL = url var selectedToExport = segmentationImages.filter { segmentation in selectedSegmentations.contains(segmentation.id) } if let currentSegmentation { selectedToExport.append(currentSegmentation) } exportSegmentations(selectedToExport, to: url) } } } // MARK: - Private Methods private func loadImage(from url: URL) { guard url.startAccessingSecurityScopedResource() else { logger.error("Failed to access the file. Security-scoped resource access denied.") return } defer { url.stopAccessingSecurityScopedResource() } do { let imageData = try Data(contentsOf: url) if let image = NSImage(data: imageData) { DispatchQueue.main.async { self.displayImage = image } } else { logger.error("Failed to create NSImage from file data") } } catch { logger.error("Error loading image data: \(error.localizedDescription)") self.error = error } } func exportSegmentations(_ segmentations: [SAMSegmentation], to directory: URL) { let fileManager = FileManager.default do { try fileManager.createDirectory(at: directory, withIntermediateDirectories: true, attributes: nil) for (index, segmentation) in segmentations.enumerated() { let filename = "segmentation_\(index + 1).png" let fileURL = directory.appendingPathComponent(filename) if let destination = CGImageDestinationCreateWithURL(fileURL as CFURL, UTType.png.identifier as CFString, 1, nil) { CGImageDestinationAddImage(destination, segmentation.cgImage, nil) if CGImageDestinationFinalize(destination) { print("Saved segmentation \(index + 1) to \(fileURL.path)") } else { print("Failed to save segmentation \(index + 1)") } } } } catch { print("Error creating directory: \(error.localizedDescription)") } } private func reset() { selectedPoints = [] boundingBoxes = [] currentBox = nil currentSegmentation = nil } } struct SizePreferenceKey: PreferenceKey { static var defaultValue: CGSize = .zero static func reduce(value: inout CGSize, nextValue: () -> CGSize) { value = nextValue() } } #Preview { ContentView() } ================================================ FILE: SAM2-Demo/Preview Content/Preview Assets.xcassets/Contents.json ================================================ { "info" : { "author" : "xcode", "version" : 1 } } ================================================ FILE: SAM2-Demo/Ripple/Ripple.metal ================================================ // Ripple.metal /* See the LICENSE at the end of the article for this sample’s licensing information. Abstract: A shader that applies a ripple effect to a view when using it as a SwiftUI layer effect. */ #include #include using namespace metal; [[ stitchable ]] half4 Ripple( float2 position, SwiftUI::Layer layer, float2 origin, float time, float amplitude, float frequency, float decay, float speed ) { // The distance of the current pixel position from `origin`. float distance = length(position - origin); // The amount of time it takes for the ripple to arrive at the current pixel position. float delay = distance / speed; // Adjust for delay, clamp to 0. time -= delay; time = max(0.0, time); // The ripple is a sine wave that Metal scales by an exponential decay // function. float rippleAmount = amplitude * sin(frequency * time) * exp(-decay * time); // A vector of length `amplitude` that points away from position. float2 n = normalize(position - origin); // Scale `n` by the ripple amount at the current pixel position and add it // to the current pixel position. // // This new position moves toward or away from `origin` based on the // sign and magnitude of `rippleAmount`. float2 newPosition = position + rippleAmount * n; // Sample the layer at the new position. half4 color = layer.sample(newPosition); // Lighten or darken the color based on the ripple amount and its alpha // component. color.rgb += 0.3 * (rippleAmount / amplitude) * color.a; return color; } ================================================ FILE: SAM2-Demo/Ripple/RippleModifier.swift ================================================ // // RippleModifier.swift // HuggingChat-Mac // // Created by Cyril Zakka on 8/28/24. // import SwiftUI /* See the LICENSE at the end of the article for this sample's licensing information. */ /// A modifier that applies a ripple effect to its content. struct RippleModifier: ViewModifier { var origin: CGPoint var elapsedTime: TimeInterval var duration: TimeInterval var amplitude: Double var frequency: Double var decay: Double var speed: Double func body(content: Content) -> some View { let shader = ShaderLibrary.Ripple( .float2(origin), .float(elapsedTime), // Parameters .float(amplitude), .float(frequency), .float(decay), .float(speed) ) let maxSampleOffset = maxSampleOffset let elapsedTime = elapsedTime let duration = duration content.visualEffect { view, _ in view.layerEffect( shader, maxSampleOffset: maxSampleOffset, isEnabled: 0 < elapsedTime && elapsedTime < duration ) } } var maxSampleOffset: CGSize { CGSize(width: amplitude, height: amplitude) } } ================================================ FILE: SAM2-Demo/Ripple/RippleViewModifier.swift ================================================ // // RippleViewModifier.swift // HuggingChat-Mac // // Created by Cyril Zakka on 8/28/24. // import SwiftUI struct RippleEffect: ViewModifier { var origin: CGPoint var trigger: T var amplitude: Double var frequency: Double var decay: Double var speed: Double init(at origin: CGPoint, trigger: T, amplitude: Double = 12, frequency: Double = 15, decay: Double = 8, speed: Double = 1200) { self.origin = origin self.trigger = trigger self.amplitude = amplitude self.frequency = frequency self.decay = decay self.speed = speed } func body(content: Content) -> some View { let origin = origin let duration = duration let amplitude = amplitude let frequency = frequency let decay = decay let speed = speed content.keyframeAnimator( initialValue: 0, trigger: trigger ) { view, elapsedTime in view.modifier(RippleModifier( origin: origin, elapsedTime: elapsedTime, duration: duration, amplitude: amplitude, frequency: frequency, decay: decay, speed: speed )) } keyframes: { _ in MoveKeyframe(0) LinearKeyframe(duration, duration: duration) } } var duration: TimeInterval { 3 } } ================================================ FILE: SAM2-Demo/SAM2_1SmallImageEncoderFLOAT16.mlpackage/Manifest.json ================================================ { "fileFormatVersion": "1.0.0", "itemInfoEntries": { "4C20C7AA-F42B-4CCD-84C3-73C031A91D48": { "author": "com.apple.CoreML", "description": "CoreML Model Weights", "name": "weights", "path": "com.apple.CoreML/weights" }, "DDCB1D63-C7BD-4A13-8EB5-D7151371105B": { "author": "com.apple.CoreML", "description": "CoreML Model Specification", "name": "model.mlmodel", "path": "com.apple.CoreML/model.mlmodel" } }, "rootModelIdentifier": "DDCB1D63-C7BD-4A13-8EB5-D7151371105B" } ================================================ FILE: SAM2-Demo/SAM2_1SmallMaskDecoderFLOAT16.mlpackage/Manifest.json ================================================ { "fileFormatVersion": "1.0.0", "itemInfoEntries": { "6FA6762D-69A1-4A0B-AB0D-512638FD7ECF": { "author": "com.apple.CoreML", "description": "CoreML Model Specification", "name": "model.mlmodel", "path": "com.apple.CoreML/model.mlmodel" }, "DB82D069-C4C9-41FB-A178-262063485D28": { "author": "com.apple.CoreML", "description": "CoreML Model Weights", "name": "weights", "path": "com.apple.CoreML/weights" } }, "rootModelIdentifier": "6FA6762D-69A1-4A0B-AB0D-512638FD7ECF" } ================================================ FILE: SAM2-Demo/SAM2_1SmallPromptEncoderFLOAT16.mlpackage/Manifest.json ================================================ { "fileFormatVersion": "1.0.0", "itemInfoEntries": { "BE0329D0-1E5D-4FF9-8ECE-350FC8DE699D": { "author": "com.apple.CoreML", "description": "CoreML Model Weights", "name": "weights", "path": "com.apple.CoreML/weights" }, "C1F60EF7-4F31-4243-8BE5-C107CB23EADF": { "author": "com.apple.CoreML", "description": "CoreML Model Specification", "name": "model.mlmodel", "path": "com.apple.CoreML/model.mlmodel" } }, "rootModelIdentifier": "C1F60EF7-4F31-4243-8BE5-C107CB23EADF" } ================================================ FILE: SAM2-Demo/SAM2_Demo.entitlements ================================================ com.apple.security.app-sandbox com.apple.security.files.user-selected.read-write ================================================ FILE: SAM2-Demo/SAM2_DemoApp.swift ================================================ // // SAM2_DemoApp.swift // SAM2-Demo // // Created by Cyril Zakka on 8/19/24. // import SwiftUI @main struct SAM2_DemoApp: App { var body: some Scene { WindowGroup { ContentView() } .windowToolbarStyle(UnifiedWindowToolbarStyle(showsTitle: false)) } } ================================================ FILE: SAM2-Demo/Views/AnnotationListView.swift ================================================ // // AnnotationListView.swift // SAM2-Demo // // Created by Cyril Zakka on 9/8/24. // import SwiftUI struct AnnotationListView: View { @Binding var segmentation: SAMSegmentation @State var showHideIcon: Bool = false var body: some View { HStack { Image(nsImage: NSImage(cgImage: segmentation.cgImage, size: NSSize(width: 25, height: 25))) .background(.quinary) .mask(RoundedRectangle(cornerRadius: 5)) VStack(alignment: .leading) { Text(segmentation.title) .font(.headline) .foregroundStyle(segmentation.isHidden ? .tertiary:.primary) // Text(segmentation.firstAppearance) // .font(.subheadline) // .foregroundStyle(.secondary) } Spacer() Button("", systemImage: segmentation.isHidden ? "eye.slash.fill" :"eye.fill", action: { segmentation.isHidden.toggle() }) .opacity(segmentation.isHidden ? 1 : (showHideIcon ? 1:0)) .buttonStyle(.borderless) .foregroundStyle(.secondary) } .onHover { state in showHideIcon = state } } } ================================================ FILE: SAM2-Demo/Views/ImageView.swift ================================================ // // ImageView.swift // SAM2-Demo // // Created by Cyril Zakka on 9/8/24. // import SwiftUI struct ImageView: View { let image: NSImage @Binding var currentScale: CGFloat @Binding var selectedTool: SAMTool? @Binding var selectedCategory: SAMCategory? @Binding var selectedPoints: [SAMPoint] @Binding var boundingBoxes: [SAMBox] @Binding var currentBox: SAMBox? @Binding var segmentationImages: [SAMSegmentation] @Binding var currentSegmentation: SAMSegmentation? @Binding var imageSize: CGSize @Binding var originalSize: NSSize? @State var animationPoint: CGPoint = .zero @ObservedObject var sam2: SAM2 @State private var error: Error? var pointSequence: [SAMPoint] { boundingBoxes.flatMap { $0.points } + selectedPoints } var body: some View { Image(nsImage: image) .resizable() .aspectRatio(contentMode: .fit) .scaleEffect(currentScale) .onTapGesture(coordinateSpace: .local) { handleTap(at: $0) } .gesture(boundingBoxGesture) .onHover { changeCursorAppearance(is: $0) } .background(GeometryReader { geometry in Color.clear.preference(key: SizePreferenceKey.self, value: geometry.size) }) .onPreferenceChange(SizePreferenceKey.self) { imageSize = $0 } .onChange(of: selectedPoints.count, { if !selectedPoints.isEmpty { performForwardPass() } }) .onChange(of: boundingBoxes.count, { if !boundingBoxes.isEmpty { performForwardPass() } }) .overlay { PointsOverlay(selectedPoints: $selectedPoints, selectedTool: $selectedTool, imageSize: imageSize) BoundingBoxesOverlay(boundingBoxes: boundingBoxes, currentBox: currentBox, imageSize: imageSize) if !segmentationImages.isEmpty { ForEach(Array(segmentationImages.enumerated()), id: \.element.id) { index, segmentation in SegmentationOverlay(segmentationImage: $segmentationImages[index], imageSize: imageSize, shouldAnimate: false) .zIndex(Double (segmentationImages.count - index)) } } if let currentSegmentation = currentSegmentation { SegmentationOverlay(segmentationImage: .constant(currentSegmentation), imageSize: imageSize, origin: animationPoint, shouldAnimate: true) .zIndex(Double(segmentationImages.count + 1)) } } } private func changeCursorAppearance(is inside: Bool) { if inside { if selectedTool == pointTool { NSCursor.pointingHand.push() } else if selectedTool == boundingBoxTool { NSCursor.crosshair.push() } } else { NSCursor.pop() } } private var boundingBoxGesture: some Gesture { DragGesture(minimumDistance: 0) .onChanged { value in guard selectedTool == boundingBoxTool else { return } if currentBox == nil { currentBox = SAMBox(startPoint: value.startLocation.fromSize(imageSize), endPoint: value.location.fromSize(imageSize), category: selectedCategory!) } else { currentBox?.endPoint = value.location.fromSize(imageSize) } } .onEnded { value in guard selectedTool == boundingBoxTool else { return } if let box = currentBox { boundingBoxes.append(box) animationPoint = box.midpoint.toSize(imageSize) currentBox = nil } } } private func handleTap(at location: CGPoint) { if selectedTool == pointTool { placePoint(at: location) animationPoint = location } } private func placePoint(at coordinates: CGPoint) { let samPoint = SAMPoint(coordinates: coordinates.fromSize(imageSize), category: selectedCategory!) self.selectedPoints.append(samPoint) } private func performForwardPass() { Task { do { try await sam2.getPromptEncoding(from: pointSequence, with: imageSize) if let mask = try await sam2.getMask(for: originalSize ?? .zero) { DispatchQueue.main.async { let colorSet = self.segmentationImages.map { $0.tintColor }; let furthestColor = furthestColor(from: colorSet, among: SAMSegmentation.candidateColors) let segmentationNumber = segmentationImages.count let segmentationOverlay = SAMSegmentation(image: mask, tintColor: furthestColor, title: "Untitled \(segmentationNumber + 1)") self.currentSegmentation = segmentationOverlay } } } catch { self.error = error } } } } #Preview { ContentView() } extension CGPoint { func fromSize(_ size: CGSize) -> CGPoint { CGPoint(x: x / size.width, y: y / size.height) } func toSize(_ size: CGSize) -> CGPoint { CGPoint(x: x * size.width, y: y * size.height) } } ================================================ FILE: SAM2-Demo/Views/LayerListView.swift ================================================ // // LayerListView.swift // SAM2-Demo // // Created by Cyril Zakka on 9/8/24. // import SwiftUI struct LayerListView: View { @Binding var segmentationImages: [SAMSegmentation] @Binding var selectedSegmentations: Set @Binding var currentSegmentation: SAMSegmentation? var body: some View { List(selection: $selectedSegmentations) { Section("Annotations List") { ForEach(Array(segmentationImages.enumerated()), id: \.element.id) { index, segmentation in AnnotationListView(segmentation: $segmentationImages[index]) .padding(.horizontal, 5) .contextMenu { Button(role: .destructive) { if let index = segmentationImages.firstIndex(where: { $0.id == segmentation.id }) { segmentationImages.remove(at: index) } } label: { Label("Delete", systemImage: "trash.fill") } } } .onDelete(perform: delete) .onMove(perform: move) if let currentSegmentation = currentSegmentation { AnnotationListView(segmentation: .constant(currentSegmentation)) .tag(currentSegmentation.id) } } } .listStyle(.sidebar) } func delete(at offsets: IndexSet) { segmentationImages.remove(atOffsets: offsets) } func move(from source: IndexSet, to destination: Int) { segmentationImages.move(fromOffsets: source, toOffset: destination) } } #Preview { ContentView() } ================================================ FILE: SAM2-Demo/Views/MaskEditor.swift ================================================ // // MaskEditor.swift // SAM2-Demo // // Created by Cyril Zakka on 9/10/24. // import SwiftUI struct MaskEditor: View { @Binding var exportMaskToPNG: Bool @Binding var segmentationImages: [SAMSegmentation] @Binding var selectedSegmentations: Set @Binding var currentSegmentation: SAMSegmentation? @State private var bgColor = Color(.sRGB, red: 30/255, green: 144/255, blue: 1) var body: some View { Form { Section { ColorPicker("Color", selection: $bgColor) .onChange(of: bgColor) { oldColor, newColor in updateSelectedSegmentationsColor(newColor) } Button("Export Selected...", action: { exportMaskToPNG = true }) .frame(maxWidth: .infinity, maxHeight: .infinity) } } .frame(maxWidth: .infinity, maxHeight: .infinity) .onChange(of: selectedSegmentations) { oldValue, newValue in bgColor = getColorOfFirstSelectedSegmentation() } .onAppear { bgColor = getColorOfFirstSelectedSegmentation() } } private func updateSelectedSegmentationsColor(_ newColor: Color) { for id in selectedSegmentations { for index in segmentationImages.indices where segmentationImages[index].id == id { segmentationImages[index].tintColor = newColor } if currentSegmentation?.id == id { currentSegmentation?.tintColor = newColor } } } private func getColorOfFirstSelectedSegmentation() -> Color { if let firstSelectedId = selectedSegmentations.first { if let firstSelectedSegmentation = segmentationImages.first(where: { $0.id == firstSelectedId }) { return firstSelectedSegmentation.tintColor } else { if let currentSegmentation { return currentSegmentation.tintColor } } } return bgColor // Return default color if no segmentation is selected } } #Preview { ContentView() } ================================================ FILE: SAM2-Demo/Views/SubtoolbarView.swift ================================================ // // SubtoolbarView.swift // SAM2-Demo // // Created by Cyril Zakka on 9/8/24. // import SwiftUI struct SubToolbar: View { @Binding var selectedPoints: [SAMPoint] @Binding var boundingBoxes: [SAMBox] @Binding var segmentationImages: [SAMSegmentation] @Binding var currentSegmentation: SAMSegmentation? var body: some View { if selectedPoints.count > 0 || boundingBoxes.count > 0 { ZStack { Rectangle() .fill(.regularMaterial) .frame(height: 30) HStack { Spacer() Button("Undo", action: undo) .padding(.trailing, 5) .disabled(selectedPoints.isEmpty && boundingBoxes.isEmpty) Button("Reset", action: resetAll) .padding(.trailing, 5) .disabled(selectedPoints.isEmpty && boundingBoxes.isEmpty) } } .transition(.move(edge: .top)) } } private func newMask() { } private func resetAll() { selectedPoints.removeAll() boundingBoxes.removeAll() segmentationImages = [] currentSegmentation = nil } private func undo() { if let lastPoint = selectedPoints.last, let lastBox = boundingBoxes.last { if lastPoint.dateAdded > lastBox.dateAdded { selectedPoints.removeLast() } else { boundingBoxes.removeLast() } } else if !selectedPoints.isEmpty { selectedPoints.removeLast() } else if !boundingBoxes.isEmpty { boundingBoxes.removeLast() } if selectedPoints.isEmpty && boundingBoxes.isEmpty { currentSegmentation = nil } } } #Preview { ContentView() } ================================================ FILE: SAM2-Demo/Views/ZoomableScrollView.swift ================================================ // // ZoomableScrollView.swift // SAM2-Demo // // Created by Cyril Zakka on 9/12/24. // import AppKit import SwiftUI struct ZoomableScrollView: NSViewRepresentable { @Binding var visibleRect: CGRect private var content: Content init(visibleRect: Binding, @ViewBuilder content: () -> Content) { self._visibleRect = visibleRect self.content = content() } func makeNSView(context: Context) -> NSScrollView { let scrollView = NSScrollView() scrollView.hasVerticalScroller = true scrollView.hasHorizontalScroller = true scrollView.autohidesScrollers = true scrollView.allowsMagnification = false scrollView.maxMagnification = 20 scrollView.minMagnification = 1 let hostedView = context.coordinator.hostingView hostedView.translatesAutoresizingMaskIntoConstraints = true hostedView.autoresizingMask = [.width, .height] hostedView.frame = scrollView.bounds scrollView.documentView = hostedView return scrollView } func makeCoordinator() -> Coordinator { let coordinator = Coordinator(hostingView: NSHostingView(rootView: self.content), parent: self) coordinator.listen() return coordinator } func updateNSView(_ nsView: NSScrollView, context: Context) { context.coordinator.hostingView.rootView = self.content } // MARK: - Coordinator class Coordinator: NSObject { var hostingView: NSHostingView var parent: ZoomableScrollView init(hostingView: NSHostingView, parent: ZoomableScrollView) { self.hostingView = hostingView self.parent = parent } func listen() { NotificationCenter.default.addObserver(forName: NSScrollView.didEndLiveMagnifyNotification, object: nil, queue: nil) { notification in let scrollView = notification.object as! NSScrollView print("did magnify: \(scrollView.magnification), \(scrollView.documentVisibleRect)") self.parent.visibleRect = scrollView.documentVisibleRect } NotificationCenter.default.addObserver(forName: NSScrollView.didEndLiveScrollNotification, object: nil, queue: nil) { notification in let scrollView = notification.object as! NSScrollView print("did scroll: \(scrollView.magnification), \(scrollView.documentVisibleRect)") self.parent.visibleRect = scrollView.documentVisibleRect } } } } ================================================ FILE: SAM2-Demo.xcodeproj/project.pbxproj ================================================ // !$*UTF8*$! { archiveVersion = 1; classes = { }; objectVersion = 70; objects = { /* Begin PBXBuildFile section */ EBAB91282C88A05500F57B83 /* ArgumentParser in Frameworks */ = {isa = PBXBuildFile; productRef = EBAB91272C88A05500F57B83 /* ArgumentParser */; }; /* End PBXBuildFile section */ /* Begin PBXCopyFilesBuildPhase section */ EBAB911D2C889D5200F57B83 /* CopyFiles */ = { isa = PBXCopyFilesBuildPhase; buildActionMask = 2147483647; dstPath = /usr/share/man/man1/; dstSubfolderSpec = 0; files = ( ); runOnlyForDeploymentPostprocessing = 1; }; /* End PBXCopyFilesBuildPhase section */ /* Begin PBXFileReference section */ EBAB911F2C889D5200F57B83 /* sam2-cli */ = {isa = PBXFileReference; explicitFileType = "compiled.mach-o.executable"; includeInIndex = 0; path = "sam2-cli"; sourceTree = BUILT_PRODUCTS_DIR; }; F136320F2C73AE78009DEF15 /* SAM 2 Studio.app */ = {isa = PBXFileReference; explicitFileType = wrapper.application; includeInIndex = 0; path = "SAM 2 Studio.app"; sourceTree = BUILT_PRODUCTS_DIR; }; /* End PBXFileReference section */ /* Begin PBXFileSystemSynchronizedBuildFileExceptionSet section */ EBAB912A2C88A21E00F57B83 /* Exceptions for "SAM2-Demo" folder in "sam2-cli" target */ = { isa = PBXFileSystemSynchronizedBuildFileExceptionSet; membershipExceptions = ( "Common/CGImage+Extension.swift", "Common/CGImage+RawBytes.swift", Common/CoreImageExtensions.swift, Common/DirectoryDocument.swift, "Common/MLMultiArray+Image.swift", Common/Models.swift, Common/SAM2.swift, SAM2_1SmallImageEncoderFLOAT16.mlpackage, SAM2_1SmallMaskDecoderFLOAT16.mlpackage, SAM2_1SmallPromptEncoderFLOAT16.mlpackage, ); target = EBAB911E2C889D5200F57B83 /* sam2-cli */; }; /* End PBXFileSystemSynchronizedBuildFileExceptionSet section */ /* Begin PBXFileSystemSynchronizedRootGroup section */ EBAB91202C889D5200F57B83 /* sam2-cli */ = { isa = PBXFileSystemSynchronizedRootGroup; path = "sam2-cli"; sourceTree = ""; }; F13632112C73AE78009DEF15 /* SAM2-Demo */ = { isa = PBXFileSystemSynchronizedRootGroup; exceptions = ( EBAB912A2C88A21E00F57B83 /* Exceptions for "SAM2-Demo" folder in "sam2-cli" target */, ); path = "SAM2-Demo"; sourceTree = ""; }; /* End PBXFileSystemSynchronizedRootGroup section */ /* Begin PBXFrameworksBuildPhase section */ EBAB911C2C889D5200F57B83 /* Frameworks */ = { isa = PBXFrameworksBuildPhase; buildActionMask = 2147483647; files = ( EBAB91282C88A05500F57B83 /* ArgumentParser in Frameworks */, ); runOnlyForDeploymentPostprocessing = 0; }; F136320C2C73AE78009DEF15 /* Frameworks */ = { isa = PBXFrameworksBuildPhase; buildActionMask = 2147483647; files = ( ); runOnlyForDeploymentPostprocessing = 0; }; /* End PBXFrameworksBuildPhase section */ /* Begin PBXGroup section */ F13632062C73AE77009DEF15 = { isa = PBXGroup; children = ( F13632112C73AE78009DEF15 /* SAM2-Demo */, EBAB91202C889D5200F57B83 /* sam2-cli */, F13632102C73AE78009DEF15 /* Products */, ); sourceTree = ""; }; F13632102C73AE78009DEF15 /* Products */ = { isa = PBXGroup; children = ( F136320F2C73AE78009DEF15 /* SAM 2 Studio.app */, EBAB911F2C889D5200F57B83 /* sam2-cli */, ); name = Products; sourceTree = ""; }; /* End PBXGroup section */ /* Begin PBXNativeTarget section */ EBAB911E2C889D5200F57B83 /* sam2-cli */ = { isa = PBXNativeTarget; buildConfigurationList = EBAB91252C889D5200F57B83 /* Build configuration list for PBXNativeTarget "sam2-cli" */; buildPhases = ( EBAB911B2C889D5200F57B83 /* Sources */, EBAB911C2C889D5200F57B83 /* Frameworks */, EBAB911D2C889D5200F57B83 /* CopyFiles */, ); buildRules = ( ); dependencies = ( ); fileSystemSynchronizedGroups = ( EBAB91202C889D5200F57B83 /* sam2-cli */, ); name = "sam2-cli"; packageProductDependencies = ( EBAB91272C88A05500F57B83 /* ArgumentParser */, ); productName = "sam2-cli"; productReference = EBAB911F2C889D5200F57B83 /* sam2-cli */; productType = "com.apple.product-type.tool"; }; F136320E2C73AE78009DEF15 /* SAM2-Demo */ = { isa = PBXNativeTarget; buildConfigurationList = F136321E2C73AE79009DEF15 /* Build configuration list for PBXNativeTarget "SAM2-Demo" */; buildPhases = ( F136320B2C73AE78009DEF15 /* Sources */, F136320C2C73AE78009DEF15 /* Frameworks */, F136320D2C73AE78009DEF15 /* Resources */, ); buildRules = ( ); dependencies = ( ); fileSystemSynchronizedGroups = ( F13632112C73AE78009DEF15 /* SAM2-Demo */, ); name = "SAM2-Demo"; packageProductDependencies = ( ); productName = "SAM2-Demo"; productReference = F136320F2C73AE78009DEF15 /* SAM 2 Studio.app */; productType = "com.apple.product-type.application"; }; /* End PBXNativeTarget section */ /* Begin PBXProject section */ F13632072C73AE77009DEF15 /* Project object */ = { isa = PBXProject; attributes = { BuildIndependentTargetsInParallel = 1; LastSwiftUpdateCheck = 1600; LastUpgradeCheck = 1600; TargetAttributes = { EBAB911E2C889D5200F57B83 = { CreatedOnToolsVersion = 16.0; }; F136320E2C73AE78009DEF15 = { CreatedOnToolsVersion = 16.0; }; }; }; buildConfigurationList = F136320A2C73AE77009DEF15 /* Build configuration list for PBXProject "SAM2-Demo" */; developmentRegion = en; hasScannedForEncodings = 0; knownRegions = ( en, Base, ); mainGroup = F13632062C73AE77009DEF15; minimizedProjectReferenceProxies = 1; packageReferences = ( EBAB91262C88A05500F57B83 /* XCRemoteSwiftPackageReference "swift-argument-parser" */, ); preferredProjectObjectVersion = 77; productRefGroup = F13632102C73AE78009DEF15 /* Products */; projectDirPath = ""; projectRoot = ""; targets = ( F136320E2C73AE78009DEF15 /* SAM2-Demo */, EBAB911E2C889D5200F57B83 /* sam2-cli */, ); }; /* End PBXProject section */ /* Begin PBXResourcesBuildPhase section */ F136320D2C73AE78009DEF15 /* Resources */ = { isa = PBXResourcesBuildPhase; buildActionMask = 2147483647; files = ( ); runOnlyForDeploymentPostprocessing = 0; }; /* End PBXResourcesBuildPhase section */ /* Begin PBXSourcesBuildPhase section */ EBAB911B2C889D5200F57B83 /* Sources */ = { isa = PBXSourcesBuildPhase; buildActionMask = 2147483647; files = ( ); runOnlyForDeploymentPostprocessing = 0; }; F136320B2C73AE78009DEF15 /* Sources */ = { isa = PBXSourcesBuildPhase; buildActionMask = 2147483647; files = ( ); runOnlyForDeploymentPostprocessing = 0; }; /* End PBXSourcesBuildPhase section */ /* Begin XCBuildConfiguration section */ EBAB91232C889D5200F57B83 /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { CODE_SIGN_STYLE = Automatic; DEVELOPMENT_TEAM = ""; ENABLE_HARDENED_RUNTIME = YES; PRODUCT_NAME = "$(TARGET_NAME)"; SWIFT_VERSION = 5.0; }; name = Debug; }; EBAB91242C889D5200F57B83 /* Release */ = { isa = XCBuildConfiguration; buildSettings = { CODE_SIGN_STYLE = Automatic; DEVELOPMENT_TEAM = ""; ENABLE_HARDENED_RUNTIME = YES; PRODUCT_NAME = "$(TARGET_NAME)"; SWIFT_VERSION = 5.0; }; name = Release; }; F136321C2C73AE79009DEF15 /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { ALWAYS_SEARCH_USER_PATHS = NO; ARCHS = arm64; ASSETCATALOG_COMPILER_GENERATE_SWIFT_ASSET_SYMBOL_EXTENSIONS = YES; CLANG_ANALYZER_NONNULL = YES; CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE; CLANG_CXX_LANGUAGE_STANDARD = "gnu++20"; CLANG_ENABLE_MODULES = YES; CLANG_ENABLE_OBJC_ARC = YES; CLANG_ENABLE_OBJC_WEAK = YES; CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES; CLANG_WARN_BOOL_CONVERSION = YES; CLANG_WARN_COMMA = YES; CLANG_WARN_CONSTANT_CONVERSION = YES; CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES; CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; CLANG_WARN_DOCUMENTATION_COMMENTS = YES; CLANG_WARN_EMPTY_BODY = YES; CLANG_WARN_ENUM_CONVERSION = YES; CLANG_WARN_INFINITE_RECURSION = YES; CLANG_WARN_INT_CONVERSION = YES; CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES; CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES; CLANG_WARN_OBJC_LITERAL_CONVERSION = YES; CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; CLANG_WARN_QUOTED_INCLUDE_IN_FRAMEWORK_HEADER = YES; CLANG_WARN_RANGE_LOOP_ANALYSIS = YES; CLANG_WARN_STRICT_PROTOTYPES = YES; CLANG_WARN_SUSPICIOUS_MOVE = YES; CLANG_WARN_UNGUARDED_AVAILABILITY = YES_AGGRESSIVE; CLANG_WARN_UNREACHABLE_CODE = YES; CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; COPY_PHASE_STRIP = NO; DEBUG_INFORMATION_FORMAT = dwarf; DEVELOPMENT_TEAM = 2EADP68M95; ENABLE_STRICT_OBJC_MSGSEND = YES; ENABLE_TESTABILITY = YES; ENABLE_USER_SCRIPT_SANDBOXING = YES; GCC_C_LANGUAGE_STANDARD = gnu17; GCC_DYNAMIC_NO_PIC = NO; GCC_NO_COMMON_BLOCKS = YES; GCC_OPTIMIZATION_LEVEL = 0; GCC_PREPROCESSOR_DEFINITIONS = ( "DEBUG=1", "$(inherited)", ); GCC_WARN_64_TO_32_BIT_CONVERSION = YES; GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; GCC_WARN_UNDECLARED_SELECTOR = YES; GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; GCC_WARN_UNUSED_FUNCTION = YES; GCC_WARN_UNUSED_VARIABLE = YES; LOCALIZATION_PREFERS_STRING_CATALOGS = YES; MACOSX_DEPLOYMENT_TARGET = 14.3; MTL_ENABLE_DEBUG_INFO = INCLUDE_SOURCE; MTL_FAST_MATH = YES; ONLY_ACTIVE_ARCH = YES; SDKROOT = macosx; SWIFT_ACTIVE_COMPILATION_CONDITIONS = "DEBUG $(inherited)"; SWIFT_OPTIMIZATION_LEVEL = "-Onone"; }; name = Debug; }; F136321D2C73AE79009DEF15 /* Release */ = { isa = XCBuildConfiguration; buildSettings = { ALWAYS_SEARCH_USER_PATHS = NO; ARCHS = arm64; ASSETCATALOG_COMPILER_GENERATE_SWIFT_ASSET_SYMBOL_EXTENSIONS = YES; CLANG_ANALYZER_NONNULL = YES; CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE; CLANG_CXX_LANGUAGE_STANDARD = "gnu++20"; CLANG_ENABLE_MODULES = YES; CLANG_ENABLE_OBJC_ARC = YES; CLANG_ENABLE_OBJC_WEAK = YES; CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES; CLANG_WARN_BOOL_CONVERSION = YES; CLANG_WARN_COMMA = YES; CLANG_WARN_CONSTANT_CONVERSION = YES; CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES; CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; CLANG_WARN_DOCUMENTATION_COMMENTS = YES; CLANG_WARN_EMPTY_BODY = YES; CLANG_WARN_ENUM_CONVERSION = YES; CLANG_WARN_INFINITE_RECURSION = YES; CLANG_WARN_INT_CONVERSION = YES; CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES; CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES; CLANG_WARN_OBJC_LITERAL_CONVERSION = YES; CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; CLANG_WARN_QUOTED_INCLUDE_IN_FRAMEWORK_HEADER = YES; CLANG_WARN_RANGE_LOOP_ANALYSIS = YES; CLANG_WARN_STRICT_PROTOTYPES = YES; CLANG_WARN_SUSPICIOUS_MOVE = YES; CLANG_WARN_UNGUARDED_AVAILABILITY = YES_AGGRESSIVE; CLANG_WARN_UNREACHABLE_CODE = YES; CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; COPY_PHASE_STRIP = NO; DEBUG_INFORMATION_FORMAT = "dwarf-with-dsym"; DEVELOPMENT_TEAM = 2EADP68M95; ENABLE_NS_ASSERTIONS = NO; ENABLE_STRICT_OBJC_MSGSEND = YES; ENABLE_USER_SCRIPT_SANDBOXING = YES; GCC_C_LANGUAGE_STANDARD = gnu17; GCC_NO_COMMON_BLOCKS = YES; GCC_WARN_64_TO_32_BIT_CONVERSION = YES; GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; GCC_WARN_UNDECLARED_SELECTOR = YES; GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; GCC_WARN_UNUSED_FUNCTION = YES; GCC_WARN_UNUSED_VARIABLE = YES; LOCALIZATION_PREFERS_STRING_CATALOGS = YES; MACOSX_DEPLOYMENT_TARGET = 14.3; MTL_ENABLE_DEBUG_INFO = NO; MTL_FAST_MATH = YES; SDKROOT = macosx; SWIFT_COMPILATION_MODE = wholemodule; }; name = Release; }; F136321F2C73AE79009DEF15 /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; CODE_SIGN_ENTITLEMENTS = "SAM2-Demo/SAM2_Demo.entitlements"; CODE_SIGN_STYLE = Automatic; COMBINE_HIDPI_IMAGES = YES; CURRENT_PROJECT_VERSION = 1; DEVELOPMENT_ASSET_PATHS = "\"SAM2-Demo/Preview Content\""; DEVELOPMENT_TEAM = ""; ENABLE_HARDENED_RUNTIME = YES; ENABLE_PREVIEWS = YES; GENERATE_INFOPLIST_FILE = YES; INFOPLIST_KEY_NSHumanReadableCopyright = ""; LD_RUNPATH_SEARCH_PATHS = ( "$(inherited)", "@executable_path/../Frameworks", ); MARKETING_VERSION = 1.0; PRODUCT_BUNDLE_IDENTIFIER = "co.huggingface.sam-2-studio"; PRODUCT_NAME = "SAM 2 Studio"; SWIFT_EMIT_LOC_STRINGS = YES; SWIFT_VERSION = 5.0; }; name = Debug; }; F13632202C73AE79009DEF15 /* Release */ = { isa = XCBuildConfiguration; buildSettings = { ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; CODE_SIGN_ENTITLEMENTS = "SAM2-Demo/SAM2_Demo.entitlements"; CODE_SIGN_STYLE = Automatic; COMBINE_HIDPI_IMAGES = YES; CURRENT_PROJECT_VERSION = 1; DEVELOPMENT_ASSET_PATHS = "\"SAM2-Demo/Preview Content\""; DEVELOPMENT_TEAM = ""; ENABLE_HARDENED_RUNTIME = YES; ENABLE_PREVIEWS = YES; GENERATE_INFOPLIST_FILE = YES; INFOPLIST_KEY_NSHumanReadableCopyright = ""; LD_RUNPATH_SEARCH_PATHS = ( "$(inherited)", "@executable_path/../Frameworks", ); MARKETING_VERSION = 1.0; PRODUCT_BUNDLE_IDENTIFIER = "co.huggingface.sam-2-studio"; PRODUCT_NAME = "SAM 2 Studio"; SWIFT_EMIT_LOC_STRINGS = YES; SWIFT_VERSION = 5.0; }; name = Release; }; /* End XCBuildConfiguration section */ /* Begin XCConfigurationList section */ EBAB91252C889D5200F57B83 /* Build configuration list for PBXNativeTarget "sam2-cli" */ = { isa = XCConfigurationList; buildConfigurations = ( EBAB91232C889D5200F57B83 /* Debug */, EBAB91242C889D5200F57B83 /* Release */, ); defaultConfigurationIsVisible = 0; defaultConfigurationName = Release; }; F136320A2C73AE77009DEF15 /* Build configuration list for PBXProject "SAM2-Demo" */ = { isa = XCConfigurationList; buildConfigurations = ( F136321C2C73AE79009DEF15 /* Debug */, F136321D2C73AE79009DEF15 /* Release */, ); defaultConfigurationIsVisible = 0; defaultConfigurationName = Release; }; F136321E2C73AE79009DEF15 /* Build configuration list for PBXNativeTarget "SAM2-Demo" */ = { isa = XCConfigurationList; buildConfigurations = ( F136321F2C73AE79009DEF15 /* Debug */, F13632202C73AE79009DEF15 /* Release */, ); defaultConfigurationIsVisible = 0; defaultConfigurationName = Release; }; /* End XCConfigurationList section */ /* Begin XCRemoteSwiftPackageReference section */ EBAB91262C88A05500F57B83 /* XCRemoteSwiftPackageReference "swift-argument-parser" */ = { isa = XCRemoteSwiftPackageReference; repositoryURL = "https://github.com/apple/swift-argument-parser.git"; requirement = { kind = upToNextMajorVersion; minimumVersion = 1.5.0; }; }; /* End XCRemoteSwiftPackageReference section */ /* Begin XCSwiftPackageProductDependency section */ EBAB91272C88A05500F57B83 /* ArgumentParser */ = { isa = XCSwiftPackageProductDependency; package = EBAB91262C88A05500F57B83 /* XCRemoteSwiftPackageReference "swift-argument-parser" */; productName = ArgumentParser; }; /* End XCSwiftPackageProductDependency section */ }; rootObject = F13632072C73AE77009DEF15 /* Project object */; } ================================================ FILE: SAM2-Demo.xcodeproj/project.xcworkspace/contents.xcworkspacedata ================================================ ================================================ FILE: SAM2-Demo.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved ================================================ { "originHash" : "59ba1edda695b389d6c9ac1809891cd779e4024f505b0ce1a9d5202b6762e38a", "pins" : [ { "identity" : "swift-argument-parser", "kind" : "remoteSourceControl", "location" : "https://github.com/apple/swift-argument-parser.git", "state" : { "revision" : "41982a3656a71c768319979febd796c6fd111d5c", "version" : "1.5.0" } } ], "version" : 3 } ================================================ FILE: sam2-cli/MainCommand.swift ================================================ import ArgumentParser import CoreImage import CoreML import ImageIO import UniformTypeIdentifiers import Combine let context = CIContext(options: [.outputColorSpace: NSNull()]) enum PointType: Int, ExpressibleByArgument { case background = 0 case foreground = 1 var asCategory: SAMCategory { switch self { case .background: return SAMCategory.background case .foreground: return SAMCategory.foreground } } } @main struct MainCommand: AsyncParsableCommand { static let configuration = CommandConfiguration( commandName: "sam2-cli", abstract: "Perform segmentation using the SAM v2 model." ) @Option(name: .shortAndLong, help: "The input image file.") var input: String // TODO: multiple points @Option(name: .shortAndLong, parsing: .upToNextOption, help: "List of input coordinates in format 'x,y'. Coordinates are relative to the input image size. Separate multiple entries with spaces, but don't use spaces between the coordinates.") var points: [CGPoint] @Option(name: .shortAndLong, parsing: .upToNextOption, help: "Point types that correspond to the input points. Use as many as points, 0 for background and 1 for foreground.") var types: [PointType] @Option(name: .shortAndLong, help: "The output PNG image file, showing the segmentation map overlaid on top of the original image.") var output: String @Option(name: [.long, .customShort("k")], help: "The output file name for the segmentation mask.") var mask: String? = nil @MainActor mutating func run() async throws { // TODO: specify directory with loadable .mlpackages instead let sam = try await SAM2.load() print("Models loaded in: \(String(describing: sam.initializationTime))") let targetSize = sam.inputSize // Load the input image guard let inputImage = CIImage(contentsOf: URL(filePath: input), options: [.colorSpace: NSNull()]) else { print("Failed to load image.") throw ExitCode(EXIT_FAILURE) } print("Original image size \(inputImage.extent)") // Resize the image to match the model's expected input let resizedImage = inputImage.resized(to: targetSize) // Convert to a pixel buffer guard let pixelBuffer = context.render(resizedImage, pixelFormat: kCVPixelFormatType_32ARGB) else { print("Failed to create pixel buffer for input image.") throw ExitCode(EXIT_FAILURE) } // Execute the model let clock = ContinuousClock() let start = clock.now try await sam.getImageEncoding(from: pixelBuffer) let duration = clock.now - start print("Image encoding took \(duration.formatted(.units(allowed: [.seconds, .milliseconds])))") let startMask = clock.now let pointSequence = zip(points, types).map { point, type in SAMPoint(coordinates:point, category:type.asCategory) } try await sam.getPromptEncoding(from: pointSequence, with: inputImage.extent.size) guard let maskImage = try await sam.getMask(for: inputImage.extent.size) else { throw ExitCode(EXIT_FAILURE) } let maskDuration = clock.now - startMask print("Prompt encoding and mask generation took \(maskDuration.formatted(.units(allowed: [.seconds, .milliseconds])))") // Write masks if let mask = mask { context.writePNG(maskImage, to: URL(filePath: mask)) } // Overlay over original and save guard let outputImage = maskImage.withAlpha(0.6)?.composited(over: inputImage) else { print("Failed to blend mask.") throw ExitCode(EXIT_FAILURE) } context.writePNG(outputImage, to: URL(filePath: output)) } } extension CGPoint: ExpressibleByArgument { public init?(argument: String) { let components = argument.split(separator: ",").map(String.init) guard components.count == 2, let x = Double(components[0]), let y = Double(components[1]) else { return nil } self.init(x: x, y: y) } }