Repository: alebedev/git-media Branch: master Commit: 09bde56ad0a0 Files: 26 Total size: 44.9 KB Directory structure: gitextract_rveq7zvc/ ├── .document ├── .gitignore ├── Gemfile ├── LICENSE ├── README.md ├── Rakefile ├── TODO ├── VERSION ├── bin/ │ └── git-media ├── git-media.gemspec ├── lib/ │ ├── git-media/ │ │ ├── clear.rb │ │ ├── filter-branch.rb │ │ ├── filter-clean.rb │ │ ├── filter-smudge.rb │ │ ├── status.rb │ │ ├── sync.rb │ │ ├── transport/ │ │ │ ├── atmos_client.rb │ │ │ ├── box.rb │ │ │ ├── local.rb │ │ │ ├── s3.rb │ │ │ ├── scp.rb │ │ │ └── webdav.rb │ │ └── transport.rb │ └── git-media.rb └── spec/ ├── media_spec.rb └── spec_helper.rb ================================================ FILE CONTENTS ================================================ ================================================ FILE: .document ================================================ README.rdoc lib/**/*.rb bin/* features/**/*.feature LICENSE ================================================ FILE: .gitignore ================================================ *.gem *.rbc /.config /coverage/ /InstalledFiles /pkg/ /spec/reports/ /spec/examples.txt /test/tmp/ /test/version_tmp/ /tmp/ ## Specific to RubyMotion: .dat* .repl_history build/ ## Documentation cache and generated files: /.yardoc/ /_yardoc/ /doc/ /rdoc/ ## Environment normalisation: /.bundle/ /vendor/bundle /lib/bundler/man/ # for a library or gem, you might want to ignore these files since the code is # intended to run in multiple environments; otherwise, check them in: Gemfile.lock .ruby-version .ruby-gemset # unless supporting rvm < 1.11.0 or doing something fancy, ignore this: .rvmrc ================================================ FILE: Gemfile ================================================ source 'https://rubygems.org' gem 'trollop' gem 's3' gem 'ruby-atmos-pure' gem 'right_aws' gem 'net_dav', :git => 'https://github.com/devrandom/net_dav.git', :require => 'net/dav' gem 'boxr' gem 'netrc' #gem 'curb', :require => false ================================================ FILE: LICENSE ================================================ Copyright (c) 2009 Scott Chacon Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # git-media GitMedia extension allows you to use Git with large media files without storing the media in Git itself. ## Configuration Setup the attributes filter settings. (once after install) $ git config filter.media.clean "git-media filter-clean" $ git config filter.media.smudge "git-media filter-smudge" Setup the `.gitattributes` file to map extensions to the filter. (in repo - once) $ echo "*.mov filter=media -crlf" > .gitattributes Staging files with those extensions will automatically copy them to the media buffer area (.git/media) until you run 'git media sync' wherein they are uploaded. Checkouts that reference media you don't have yet will try to be automatically downloaded, otherwise they are downloaded when you sync. Next you need to configure git to tell it where you want to store the large files. There are five options: 1. Storing remotely in Amazon's S3 2. Storing locally in a filesystem path 3. Storing remotely via SCP (should work with any SSH server) 4. Storing remotely in atmos 5. Storing remotely via WebDav Here are the relevant sections that should go either in `~/.gitconfig` (for global settings) or in `clone/.git/config` (for per-repo settings). ```ini [git-media] transport = autodownload = # settings for scp transport scpuser = scphost = scppath = # settings for local transport localpath = # settings for s3 transport s3bucket = s3key = s3secret = # settings for atmos transport endpoint = uid = secret = tag = # settings for webdav transport webdavurl = # user and password are taken from netrc if omitted webdavuser = webdavpassword = webdavverifyserver = webdavbinarytransfer = ``` ## Usage (in repo - repeatedly) $ (hack, stage, commit) $ git media sync You can also check the status of your media files via $ git media status Which will show you files that are waiting to be uploaded and how much data that is. If you want to upload & delete the local cache of media files, run: $ git media clear If you want to replace file in git-media with changed version (for example, video file has been edited), you need to explicitly tell git that some media files has changed: $ git update-index --really-refresh ## Config Settings If autodownload is set to true, required files will automatically be downloaded when checking out or pulling. Default is false $ git config --global media.autodownload true ## Installing $ git clone git@github.com:alebedev/git-media.git $ cd git-media $ sudo gem install bundler $ bundle install $ gem build git-media.gemspec $ sudo gem install git-media-*.gem ## Notes for Windows It is important to switch off git smart newline character support for media files. Use `-crlf` switch in `.gitattributes` (for example `*.mov filter=media -crlf`) or config option `core.autocrlf = false`. If installing on windows, you might run into a problem verifying certificates for S3 or something. If that happens, see the [instructions in this Gist for how to update your RubyGems to the proper certificates](https://gist.github.com/luislavena/f064211759ee0f806c88). ## Copyright Copyright (c) 2009 Scott Chacon. See LICENSE for details. ================================================ FILE: Rakefile ================================================ require 'rubygems' require 'rake' begin require 'jeweler' Jeweler::Tasks.new do |gem| gem.name = "git-media" gem.summary = %Q{git-media} gem.email = "schacon@gmail.com" gem.homepage = "http://github.com/schacon/git-media" gem.authors = ["Scott Chacon"] # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings end rescue LoadError puts "Jeweler (or a dependency) not available. Install it with: sudo gem install jeweler" end require 'spec/rake/spectask' Spec::Rake::SpecTask.new(:spec) do |spec| spec.libs << 'lib' << 'spec' spec.spec_files = FileList['spec/**/*_spec.rb'] end Spec::Rake::SpecTask.new(:rcov) do |spec| spec.libs << 'lib' << 'spec' spec.pattern = 'spec/**/*_spec.rb' spec.rcov = true end task :default => :spec require 'rake/rdoctask' Rake::RDocTask.new do |rdoc| if File.exist?('VERSION.yml') config = YAML.load(File.read('VERSION.yml')) version = "#{config[:major]}.#{config[:minor]}.#{config[:patch]}" else version = "" end rdoc.rdoc_dir = 'rdoc' rdoc.title = "git-media #{version}" rdoc.rdoc_files.include('README*') rdoc.rdoc_files.include('lib/**/*.rb') end ================================================ FILE: TODO ================================================ == Tools * tool to clean large files out of existing repo (filter-branch?) - can also just re-do the last commit with a new filter * git media add (file) - adds it to the .gitattributes file == Transports * Local * Amazon S3 * SCP * SFTP * FTP ================================================ FILE: VERSION ================================================ 0.1.4 ================================================ FILE: bin/git-media ================================================ #!/usr/bin/env ruby require 'optparse' $:.unshift File.join(File.dirname(__FILE__), '..', 'lib') require 'git-media' GitMedia::Application.run! ================================================ FILE: git-media.gemspec ================================================ # Generated by jeweler # DO NOT EDIT THIS FILE DIRECTLY # Instead, edit Jeweler::Tasks in rakefile, and run 'rake gemspec' # -*- encoding: utf-8 -*- Gem::Specification.new do |s| s.name = "git-media" s.version = "0.1.5" s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version= s.authors = ["Scott Chacon", "Alexander Lebedev"] s.date = "2014-10-20" s.email = "alexander.lebedev@gmail.com" s.executables = ["git-media"] s.extra_rdoc_files = [ "LICENSE", "README.md", "TODO" ] s.files = [ ".document", "Gemfile", "Gemfile.lock", "LICENSE", "README.md", "Rakefile", "TODO", "VERSION", "bin/git-media", "git-media.gemspec", "lib/git-media.rb", "lib/git-media/clear.rb", "lib/git-media/filter-clean.rb", "lib/git-media/filter-smudge.rb", "lib/git-media/filter-branch.rb", "lib/git-media/status.rb", "lib/git-media/sync.rb", "lib/git-media/transport.rb", "lib/git-media/transport/atmos_client.rb", "lib/git-media/transport/box.rb", "lib/git-media/transport/local.rb", "lib/git-media/transport/s3.rb", "lib/git-media/transport/scp.rb", "lib/git-media/transport/webdav.rb", "spec/media_spec.rb", "spec/spec_helper.rb" ] s.homepage = "http://github.com/alebedev/git-media" s.require_paths = ["lib"] s.rubygems_version = "1.8.28" s.summary = "git-media" if s.respond_to? :specification_version then s.specification_version = 3 if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then s.add_runtime_dependency(%q, [">= 0"]) else s.add_dependency(%q, [">= 0"]) end else s.add_dependency(%q, [">= 0"]) end end ================================================ FILE: lib/git-media/clear.rb ================================================ require 'git-media/status' module GitMedia module Clear def self.run! @push = GitMedia.get_push_transport self.clear_local_cache end def self.clear_local_cache # find files in media buffer and delete all pushed files all_cache = Dir.chdir(GitMedia.get_media_buffer) { Dir.glob('*') } unpushed_files = @push.get_unpushed(all_cache) pushed_files = all_cache - unpushed_files pushed_files.each do |sha| puts "Removing " + sha[0, 8] File.unlink(File.join(GitMedia.get_media_buffer, sha)) end end end end ================================================ FILE: lib/git-media/filter-branch.rb ================================================ require 'set' require 'git-media/filter-clean' require 'fileutils' include Process module GitMedia module FilterBranch def self.get_temp_buffer @@git_dir ||= `git rev-parse --git-dir`.chomp temp_buffer = File.join(@@git_dir, 'media/filter-branch') FileUtils.mkdir_p(temp_buffer) if !File.exist?(temp_buffer) return temp_buffer end def self.clean! tmp_buffer = get_temp_buffer FileUtils.rm_r (tmp_buffer) FileUtils.rmdir (tmp_buffer) end def self.run! # Rewriting of history # Inspired by how git-fat does it inputfiles = ARGF.read.split("\n").map { |s| s.downcase }.to_set all_files = `git ls-files -s`.split("\n") filecount = all_files.length.to_s # determine and initialize our media buffer directory media_buffer = GitMedia.get_media_buffer tmp_buffer = get_temp_buffer STDOUT.write (" ") index = 0 prevLength = 0 fileLists = [[],[],[],[]] all_files.each_with_index do |f, i| fileLists[i % fileLists.length].push (f) end update_index_reader, update_index_writer = IO.pipe update_index_pid = spawn("git update-index --index-info", :in=>update_index_reader) update_index_reader.close mutex = Mutex.new threads = [] fileLists.each_with_index do |files, thread_index| fls = files thread = Thread.new do fls.each do |line| index += 1 head, filepath = line.split("\t") filepath.strip! if not inputfiles.include? (filepath.downcase) next end mode, blob, stagenumber = head.split() # Skip symlinks if mode == "120000" next end # 1 Find cached git-hash of the media stub # 1.2 If not found, calculate it # 1.3 store object in media buffer # 1.4 save the hash in the cache # 2 Replace object with git-hash of the stub #1 hash_file_path = File.join(tmp_buffer, blob) hash_of_stub = nil if File.exists?(hash_file_path) File.open(hash_file_path, "rb") do |f| hash_of_stub = f.read.strip() end else # Only show progress output for thread 0 because otherwise the thread # output might get messed up by multiple threads writing at the same time if thread_index == 0 # Erase previous output text # \b is backspace prevLength.times { STDOUT.write("\b") STDOUT.write(" ") STDOUT.write("\b") } line = "Filtering " + index.to_s + " of " + filecount + " : " + filepath prevLength = line.length STDOUT.write (line) end # pipes roughly equivalent to # cat-file | clean | hash | update-index # 1.2, 1.3 gitcat_reader, gitcat_writer= IO.pipe gitcat_pid = spawn("git cat-file blob " + blob, :out=>gitcat_writer, :close_others=>true) # We are not using it, so close it gitcat_writer.close githash_reader, githash_writer= IO.pipe githash_output_reader, githash_output_writer= IO.pipe githash_pid = spawn("git hash-object -w --stdin", :in=>githash_reader, :out=>githash_output_writer) githash_output_writer.close githash_reader.close GitMedia::FilterClean.run!(gitcat_reader, githash_writer, false) gitcat_reader.close githash_writer.close hash_of_stub = githash_output_reader.read().strip() # 1.4 cache = File.new(hash_file_path, File::CREAT|File::RDWR|File::BINARY) cache.write(hash_of_stub) cache.close wait (githash_pid) wait (gitcat_pid) end # 2 update = mode + " " + hash_of_stub + " " + stagenumber + "\t" + filepath + "\n" # Synchronize with a mutex to avoid multiple # threads writing to the pipe at the same time mutex.synchronize do update_index_writer.write(update) end end end threads.push(thread) end threads.each do |thread| thread.join end update_index_writer.close() wait(update_index_pid) end end end ================================================ FILE: lib/git-media/filter-clean.rb ================================================ require 'digest/sha1' require 'fileutils' require 'tempfile' module GitMedia module FilterClean def self.run!(input=STDIN, output=STDOUT, info_output=true) input.binmode # Read first 42 bytes # If the file is only 41 bytes long (as in the case of a stub) # it will only return a string with a length of 41 data = input.read(42) output.binmode if data != nil && data.length == 41 && data.match(/^[0-9a-fA-F]+\n$/) # Exactly 41 bytes long and matches the hex string regex # This is most likely a stub # TODO: Maybe add some additional marker in the files like # "[hex string]:git-media" # to really be able to say that a file is a stub output.write (data) if info_output STDERR.puts("Skipping unexpanded stub : " + data[0, 8]) end else # determine and initialize our media buffer directory media_buffer = GitMedia.get_media_buffer hashfunc = Digest::SHA1.new start = Time.now # read in buffered chunks of the data # calculating the SHA and copying to a tempfile tempfile = Tempfile.new('media', :binmode => true) # Write the first 42 bytes if data != nil hashfunc.update(data) tempfile.write(data) end while data = input.read(4096) hashfunc.update(data) tempfile.write(data) end tempfile.close # calculate and print the SHA of the data output.print hx = hashfunc.hexdigest output.write("\n") # move the tempfile to our media buffer area media_file = File.join(media_buffer, hx) FileUtils.mv(tempfile.path, media_file) elapsed = Time.now - start if info_output STDERR.puts('Saving media : ' + hx + ' : ' + elapsed.to_s) end end end end end ================================================ FILE: lib/git-media/filter-smudge.rb ================================================ module GitMedia module FilterSmudge def self.print_stream(stream) # create a binary stream to write to stdout # this avoids messing up line endings on windows outstream = IO.try_convert(STDOUT) outstream.binmode while data = stream.read(4096) do print data end end def self.run! media_buffer = GitMedia.get_media_buffer # read checksum size STDIN.binmode STDOUT.binmode orig = STDIN.readline(64) sha = orig.strip # read no more than 64 bytes if STDIN.eof? && sha.length == 40 && sha.match(/^[0-9a-fA-F]+$/) != nil # this is a media file media_file = File.join(media_buffer, sha.chomp) if File.exists?(media_file) STDERR.puts('Recovering media : ' + sha) File.open(media_file, 'rb') do |f| print_stream(f) end else # Read key from config auto_download = `git config git-media.autodownload`.chomp.downcase == "true" if auto_download pull = GitMedia.get_pull_transport cache_file = GitMedia.media_path(sha) if !File.exist?(cache_file) STDERR.puts ("Downloading : " + sha[0,8]) # Download the file from backend storage # We have no idea what the final file will be (therefore nil) pull.pull(nil, sha) end STDERR.puts ("Expanding : " + sha[0,8]) if File.exist?(cache_file) File.open(media_file, 'rb') do |f| print_stream(f) end else STDERR.puts ("Could not get media, saving placeholder : " + sha) puts orig end else STDERR.puts('Media missing, saving placeholder : ' + sha) # Print orig and not sha to preserve eventual newlines at end of file # To avoid git thinking the file has changed puts orig end end else # if it is not a 40 character long hash, just output STDERR.puts('Unknown git-media file format') print orig print_stream(STDIN) end end end end ================================================ FILE: lib/git-media/status.rb ================================================ require 'pp' Encoding.default_external = Encoding::UTF_8 module GitMedia module Status def self.run!(opts) @push = GitMedia.get_push_transport r = self.find_references self.print_references(r, opts[:short]) r = self.local_cache_status self.print_cache_status(r, opts[:short]) end # find tree entries that are likely media references def self.find_references references = {:to_expand => [], :expanded => [], :deleted => []} files = `git ls-tree -l -r HEAD | tr "\\000" \\\\n`.split("\n") files = files.map { |f| s = f.split("\t"); [s[0].split(' ').last, s[1]] } files = files.select { |f| f[0] == '41' } # it's the right size files.each do |tree_size, fname| if File.exists?(fname) size = File.size(fname) # Windows newlines can offset file size by 1 if size == tree_size.to_i or size == tree_size.to_i + 1 # TODO: read in the data and verify that it's a sha + newline fname = fname.tr("\\","") #remove backslash sha = File.read(fname).strip if sha.length == 40 && sha =~ /^[0-9a-f]+$/ references[:to_expand] << [fname, sha] end else references[:expanded] << fname end else # file was deleted references[:deleted] << fname end end references end def self.print_references(refs, short=false) if refs[:to_expand].size > 0 puts "== Unexpanded Media ==" if short puts "Count: " + refs[:to_expand].size.to_s else refs[:to_expand].each do |file, sha| puts " " + sha[0, 8] + " " + file end puts end end if refs[:expanded].size > 0 puts "== Expanded Media ==" if short puts "Count: " + refs[:expanded].size.to_s else refs[:expanded].each do |file| size = File.size(file) puts " " + "(#{self.to_human(size)})".ljust(8) + " #{file}" end puts end end if refs[:deleted].size > 0 puts "== Deleted Media ==" if short puts "Count: " + refs[:deleted].size.to_s else refs[:deleted].each do |file| puts " " + " #{file}" end puts end end end def self.print_cache_status(refs, short) if refs[:unpushed].size > 0 puts "== Unpushed Media ==" if short puts "Count: " + refs[:unpushed].size.to_s else refs[:unpushed].each do |sha| cache_file = GitMedia.media_path(sha) size = File.size(cache_file) puts " " + "(#{self.to_human(size)})".ljust(8) + ' ' + sha[0, 8] end puts end end if refs[:pushed].size > 0 puts "== Already Pushed Media ==" if short puts "Count: " + refs[:pushed].size.to_s else refs[:pushed].each do |sha| cache_file = GitMedia.media_path(sha) size = File.size(cache_file) puts " " + "(#{self.to_human(size)})".ljust(8) + ' ' + sha[0, 8] end puts end end end def self.local_cache_status # find files in media buffer and upload them references = {:unpushed => [], :pushed => []} all_cache = Dir.chdir(GitMedia.get_media_buffer) { Dir.glob('*') } unpushed_files = @push.get_unpushed(all_cache) || [] references[:unpushed] = unpushed_files references[:pushed] = all_cache - unpushed_files rescue [] references end def self.to_human(size) if size < 1024 return size.to_s + 'b' elsif size < 1048576 return (size / 1024).to_s + 'k' else return (size / 1048576).to_s + 'm' end end end end ================================================ FILE: lib/git-media/sync.rb ================================================ # find files that are placeholders (41 char) and download them # upload files in media buffer that are not in offsite bin require 'git-media/status' module GitMedia module Sync def self.run! @push = GitMedia.get_push_transport @pull = GitMedia.get_pull_transport self.expand_references self.update_index self.upload_local_cache end def self.expand_references status = GitMedia::Status.find_references status[:to_expand].each_with_index do |tuple, index| file = tuple[0] sha = tuple[1] cache_file = GitMedia.media_path(sha) if !File.exist?(cache_file) puts "Downloading " + sha[0,8] + " : " + file @pull.pull(file, sha) end puts "Expanding " + (index+1).to_s + " of " + status[:to_expand].length.to_s + " : " + sha[0,8] + " : " + file if File.exist?(cache_file) FileUtils.cp(cache_file, file) else puts 'Could not get media from storage' end end end def self.update_index refs = GitMedia::Status.find_references # Split references up into lists of at most 500 # because most OSs have limits on the size of the argument list # TODO: Could probably use the --stdin flag on git update-index to be # able to update it in a single call refLists = refs[:expanded].each_slice(500).to_a refLists.each { |refList| refList = refList.map { |v| "\"" + v + "\""} `git update-index --assume-unchanged -- #{refList.join(' ')}` } puts "Updated git index" end def self.upload_local_cache # find files in media buffer and upload them all_cache = Dir.chdir(GitMedia.get_media_buffer) { Dir.glob('*') } unpushed_files = @push.get_unpushed(all_cache) unpushed_files.each_with_index do |sha, index| puts "Uploading " + sha[0, 8] + " " + (index+1).to_s + " of " + unpushed_files.length.to_s @push.push(sha) end # TODO: if --clean, remove them end end end ================================================ FILE: lib/git-media/transport/atmos_client.rb ================================================ require 'git-media/transport' require 'ruby-atmos-pure' require 'atmos' # git-media.transport atmos # git-media.endpoint # git-media.uid # git-media.secret # git-media.tag (optional) module GitMedia module Transport class AtmosClient < Base def initialize(endpoint, uid, secret, tag) atmos_options = { :url => endpoint, :uid => uid, :secret => secret } @tag = tag @atmos_client = Atmos::Store.new(atmos_options) end def read? reachable? end def get_file(sha, to_file) dst_file = File.new(to_file, File::CREAT|File::RDWR|File::BINARY) @atmos_client.get(:namespace => sha).data_as_stream do |chunck| dst_file.write(chunck) end end def write reachable? end def put_file(sha, from_file) src_file = File.open(from_file,"rb") obj_conf = {:data => src_file, :length => File.size(from_file), :namespace => sha} obj_conf[:listable_metadata] = {@tag => true} if @tag @atmos_client.create(obj_conf) end def get_unpushed(files) unpushed = [] files.each do |file| begin @atmos_client.get(:namespace => file) rescue Atmos::Exceptions::AtmosException unpushed << file end end unpushed end private # dummy function to test connectivity to atmos def reachable? @atmos_client.server_version true rescue false end end end end ================================================ FILE: lib/git-media/transport/box.rb ================================================ require 'git-media/transport' require 'boxr' require 'shellwords' # git-media.transport box # git-media.boxclientid # git-media.boxclientsecret # git-media.boxredirecturi # git-media.boxfolderid # git-media.boxaccesstoken # git-media.boxrefreshtoken module GitMedia module Transport class Box < Base def initialize(client_id, client_secret, redirect_uri, folder_id, access_token, refresh_token) if access_token == "" || refresh_token == "" uri = Boxr::oauth_url(redirect_uri, box_client_id: client_id) print "(1) Paste following URL to your browser, and get your access code:\n\n#{uri}\n\n(2) Enter your access code: " code = STDIN.gets.chomp token = Boxr::get_tokens(code, box_client_id: client_id, box_client_secret: client_secret) access_token = token.access_token refresh_token = token.refresh_token `git config git-media.boxaccesstoken #{access_token.shellescape}` `git config git-media.boxrefreshtoken #{refresh_token.shellescape}` end token_refresh_callback = lambda {|at, rt, id| `git config git-media.boxaccesstoken #{at.shellescape}` `git config git-media.boxrefreshtoken #{rt.shellescape}` } @box = Boxr::Client.new(access_token, refresh_token: refresh_token, box_client_id: client_id, box_client_secret: client_secret, &token_refresh_callback) @folder = @box.folder_from_id(folder_id) end def read? true end def get_file(sha, to_file) files = get_files(true) if files.has_key?(sha) == false files = get_files() end file_id = files[sha] if file_id == nil STDERR.puts("Storage backend (box) did not contain file : "+sha+", have you run 'git media sync' from all repos?") return false end file = @box.file_from_id(file_id) content = @box.download_file(file) File::open(to_file, "wb") do |f| f.write(content) end end def write? true end def put_file(sha, from_file) @box.upload_file(from_file, @folder) end def get_unpushed(files) remote_files = get_files() files.select do |f| !remote_files.has_key?(f) end end def get_files(use_cache = false) media_buffer = GitMedia.get_media_buffer cache_file = File.join(media_buffer, "cache") files = {} if use_cache File::exists?(cache_file) && File::open(cache_file) do |f| f.each do |s| r = s.strip.split(",") files[r[0]] = r[1] end end return files if files.length > 0 end offset = 0 limit = 100 while (items = @box.folder_items(@folder, fields: [:id, :name], offset: offset, limit: limit)).length > 0 items.each do |f| files[f[:name]] = f[:id] end offset = offset + limit end # cache update f = File::open(cache_file, "w") files.each do |name, id| f.puts "#{name},#{id}" end f.close return files end end end end ================================================ FILE: lib/git-media/transport/local.rb ================================================ require 'git-media/transport' # move large media to local bin # git-media.transport local # git-media.localpath /opt/media module GitMedia module Transport class Local < Base def initialize(path) @path = path end def read? File.exist?(@path) end def get_file(sha, to_file) from_file = File.join(@path, sha) if File.exists?(from_file) FileUtils.cp(from_file, to_file) return true end return false end def write? File.exist?(@path) end def put_file(sha, from_file) to_file = File.join(@path, sha) if File.exists?(from_file) FileUtils.cp(from_file, to_file) return true end return false end def get_unpushed(files) files.select do |f| !File.exist?(File.join(@path, f)) end end end end end ================================================ FILE: lib/git-media/transport/s3.rb ================================================ require 'git-media/transport' require 's3' require 'right_aws' # git-media.transport s3 # git-media.s3bucket # git-media.s3key # git-media.s3secret module GitMedia module Transport class S3 < Base def initialize(bucket, access_key_id = nil, secret_access_key = nil) @s3 = RightAws::S3Interface.new(access_key_id, secret_access_key, {:multi_thread => true, :logger => Logger.new(File.expand_path('~/.git-media.s3.log'))}) @bucket = bucket begin @buckets = @s3.list_all_my_buckets.map { |a| a[:name] } rescue RightAws::AwsError # Need to use STDERR because this might be called inside a filter STDERR.puts ("Failed to connect to storage backend (S3)") raise end if !@buckets.include?(bucket) # Need to use STDERR because this might be called inside a filter STDERR.puts ("Creating New Bucket") if @s3.create_bucket(bucket) @buckets << bucket end end end def read? @buckets.size > 0 end def get_file(sha, to_file) to = File.new(to_file, File::CREAT|File::RDWR|File::BINARY) begin @s3.get(@bucket, sha) do |chunk| to.write(chunk) end to.close return true rescue RightAws::AwsError => e # Delete the file to make sure it is not expanded to.close File.delete(to_file) # Ugly, but AwsError does not seem to give me much choice if e.message.include?('NoSuchKey') STDERR.puts("Storage backend (S3) did not contain file : "+sha+", have you run 'git media sync' from all repos?") return false else # Need to use STDERR because this might be called inside a filter STDERR.puts ("Downloading file from S3 failed with error:\n" + e.message) return false end end end def write? @buckets.size > 0 end def put_file(sha, from_file) @s3.put(@bucket, sha, File.open(from_file,"rb")) end def get_unpushed(files) # Using a set instead of a list improves performance a lot # since it reduces the complexity from O(n^2) to O(n) keys = Set.new() # Apparently the list_bucket method only returns the first 1000 elements # This method however will continue to give back results until all elements # have been listed @s3.incrementally_list_bucket(@bucket) { |contents| contents[:contents].each { |element| keys.add (element[:key]) } } files.select do |f| !keys.include?(f) end end end end end ================================================ FILE: lib/git-media/transport/scp.rb ================================================ require 'git-media/transport' # move large media to remote server via SCP # git-media.transport scp # git-media.scpuser someuser # git-media.scphost remoteserver.com # git-media.scppath /opt/media module GitMedia module Transport class Scp < Base def initialize(user, host, path, port) @user = user @host = host @path = path unless port === "" @sshport = "-p#{port}" end unless port === "" @scpport = "-P#{port}" end end def exist?(file) if `ssh #{@user}@#{@host} #{@sshport} [ -f "#{file}" ] && echo 1 || echo 0`.chomp == "1" puts file + " exists" return true else puts file + " doesn't exists" return false end end def read? return true end def get_file(sha, to_file) from_file = @user+"@"+@host+":"+File.join(@path, sha) `scp #{@scpport} "#{from_file}" "#{to_file}"` if $? == 0 puts sha+" downloaded" return true end puts sha+" download fail" return false end def write? return true end def put_file(sha, from_file) to_file = @user+"@"+@host+":"+File.join(@path, sha) `scp #{@scpport} "#{from_file}" "#{to_file}"` if $? == 0 puts sha+" uploaded" return true end puts sha+" upload fail" return false end def get_unpushed(files) files.select do |f| !self.exist?(File.join(@path, f)) end end end end end ================================================ FILE: lib/git-media/transport/webdav.rb ================================================ require 'git-media/transport' require 'uri' require 'net/dav' module GitMedia module Transport class WebDav < Base def initialize(url, user, password, verify_server=true, binary_transfer=false) @uri = URI(url) # Faster binary transport requires curb gem @dav = Net::DAV.new(url, :curl => (binary_transfer)) @dav.verify_server = verify_server @dav.credentials(user, password) print 'checking connection... ' @has_connection = @dav.exists?('.') puts (if @has_connection then 'ok' else 'failed' end) end def read? @has_connection end def write? @has_connection end def get_path(path) @uri.merge(path).path end def exists?(file) @dav.exists?(get_path(file)) end def get_file(sha, to_file) to = File.new(to_file, File::CREAT|File::RDWR|File::BINARY) begin @dav.get(get_path(sha)) do |chunk| to.write(chunk) end true ensure to.close end end def put_file(sha, from_file) @dav.put(get_path(sha), File.open(from_file, "rb"), File.size(from_file)) end def get_unpushed(files) files.select do |f| !self.exists?(f) end end end end end ================================================ FILE: lib/git-media/transport.rb ================================================ module GitMedia module Transport class Base def pull(final_file, sha) to_file = GitMedia.media_path(sha) get_file(sha, to_file) end def push(sha) from_file = GitMedia.media_path(sha) put_file(sha, from_file) end ## OVERWRITE ## def read? false end def write? false end def get_file(sha, to_file) false end def put_file(sha, to_file) false end def get_unpushed(files) files end end end end ================================================ FILE: lib/git-media.rb ================================================ require 'rubygems' require 'bundler/setup' require 'trollop' require 'fileutils' # module GitMedia def self.get_media_buffer @@git_dir ||= `git rev-parse --git-dir`.chomp media_buffer = File.join(@@git_dir, 'media/objects') FileUtils.mkdir_p(media_buffer) if !File.exist?(media_buffer) return media_buffer end def self.media_path(sha) buf = self.get_media_buffer File.join(buf, sha) end # TODO: select the proper transports based on settings def self.get_push_transport self.get_transport end def self.get_credentials_from_netrc(url) require 'uri' require 'netrc' uri = URI(url) hostname = uri.host unless hostname raise "Cannot identify hostname within git-media.webdavurl value" end netrc = Netrc.read netrc[hostname] end def self.get_transport transport = `git config git-media.transport`.chomp case transport when "" raise "git-media.transport not set" when "scp" require 'git-media/transport/scp' user = `git config git-media.scpuser`.chomp host = `git config git-media.scphost`.chomp path = `git config git-media.scppath`.chomp port = `git config git-media.scpport`.chomp if user === "" raise "git-media.scpuser not set for scp transport" end if host === "" raise "git-media.scphost not set for scp transport" end if path === "" raise "git-media.scppath not set for scp transport" end GitMedia::Transport::Scp.new(user, host, path, port) when "local" require 'git-media/transport/local' path = `git config git-media.localpath`.chomp if path === "" raise "git-media.localpath not set for local transport" end GitMedia::Transport::Local.new(path) when "s3" require 'git-media/transport/s3' bucket = `git config git-media.s3bucket`.chomp key = `git config git-media.s3key`.chomp secret = `git config git-media.s3secret`.chomp if bucket === "" raise "git-media.s3bucket not set for s3 transport" end if key === "" raise "git-media.s3key not set for s3 transport" end if secret === "" raise "git-media.s3secret not set for s3 transport" end GitMedia::Transport::S3.new(bucket, key, secret) when "atmos" require 'git-media/transport/atmos_client' endpoint = `git config git-media.endpoint`.chomp uid = `git config git-media.uid`.chomp secret = `git config git-media.secret`.chomp tag = `git config git-media.tag`.chomp if endpoint == "" raise "git-media.endpoint not set for atmos transport" end if uid == "" raise "git-media.uid not set for atmos transport" end if secret == "" raise "git-media.secret not set for atmos transport" end GitMedia::Transport::AtmosClient.new(endpoint, uid, secret, tag) when "webdav" require 'git-media/transport/webdav' url = `git config git-media.webdavurl`.chomp user = `git config git-media.webdavuser`.chomp password = `git config git-media.webdavpassword`.chomp verify_server = `git config git-media.webdavverifyserver`.chomp == 'true' binary_transfer = `git config git-media.webdavbinarytransfer`.chomp == 'true' if url == "" raise "git-media.webdavurl not set for webdav transport" end if user == "" user, password = self.get_credentials_from_netrc(url) end if !user raise "git-media.webdavuser not set for webdav transport" end if !password raise "git-media.webdavpassword not set for webdav transport" end GitMedia::Transport::WebDav.new(url, user, password, verify_server, binary_transfer) when "box" require 'git-media/transport/box' client_id = `git config git-media.boxclientid`.chomp client_secret = `git config git-media.boxclientsecret`.chomp redirect_uri = `git config git-media.boxredirecturi`.chomp folder_id = `git config git-media.boxfolderid`.chomp access_token = `git config git-media.boxaccesstoken`.chomp refresh_token = `git config git-media.boxrefreshtoken`.chomp if client_id == "" raise "git-media.boxclientid not set for box transport" end if client_secret == "" raise "git-media.boxclientsecret not set for box transport" end if redirect_uri == "" raise "git-media.boxredirecturi not set for box transport" end if folder_id == "" raise "git-media.boxfolderid not set for box transport" end GitMedia::Transport::Box.new(client_id, client_secret, redirect_uri, folder_id, access_token, refresh_token) else raise "Invalid transport #{transport}" end end def self.get_pull_transport self.get_transport end module Application def self.run! if !system('git rev-parse') return end cmd = ARGV.shift # get the subcommand cmd_opts = case cmd when "filter-clean" # parse delete options require 'git-media/filter-clean' GitMedia::FilterClean.run! when "filter-smudge" require 'git-media/filter-smudge' GitMedia::FilterSmudge.run! when "clear" # parse delete options require 'git-media/clear' GitMedia::Clear.run! when "sync" require 'git-media/sync' GitMedia::Sync.run! when 'status' require 'git-media/status' opts = Trollop::options do opt :force, "Force status" opt :short, "Short status" end GitMedia::Status.run!(opts) when 'retroactively-apply' require 'git-media/filter-branch' GitMedia::FilterBranch.clean! arg2 = "--index-filter 'git media index-filter #{ARGV.shift}'" system("git filter-branch #{arg2} --tag-name-filter cat -- --all") GitMedia::FilterBranch.clean! when 'index-filter' require 'git-media/filter-branch' GitMedia::FilterBranch.run! else print < to_rewrite' EOF end end end end ================================================ FILE: spec/media_spec.rb ================================================ require File.expand_path(File.dirname(__FILE__) + '/spec_helper') # I realize this is horrible, horrible rspec but I want to run the actual # git commands and it takes forever to setup the test env each time, so # i'm squeezing a bunch of tests into each 'it' - don't judge me describe "Media" do it "should clean and smudge and save data in buffer area" do in_temp_git_w_media do git('add .') git("commit -m 'testing'") # check that we saved the sha and not the data size = git("cat-file -s master:testing1.mov") size.should eql('41') # check that the data is in our buffer area Dir.chdir('.git/media/objects') do objects = Dir.glob('*') objects.should include('20eabe5d64b0e216796e834f52d61fd0b70332fc') end # check that removing the file and checking out returns the data File.unlink('testing1.mov') git('checkout testing1.mov') File.size('testing1.mov').should eql(7) # check that removing the file and checking out sans data returns the sha File.unlink('testing1.mov') File.unlink('.git/media/objects/20eabe5d64b0e216796e834f52d61fd0b70332fc') git('checkout testing1.mov') File.size('testing1.mov').should eql(41) end end it "should show me the status of my directory" it "should sync with a local transport" end ================================================ FILE: spec/spec_helper.rb ================================================ require 'rubygems' require 'spec' require 'tempfile' require 'pp' $LOAD_PATH.unshift(File.dirname(__FILE__)) $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib')) require 'git-media' Spec::Runner.configure do |config| end def in_temp_git tf = Tempfile.new('gitdir') temppath = tf.path tf.unlink FileUtils.mkdir(temppath) Dir.chdir(temppath) do `git init` yield end end def in_temp_git_w_media bin = File.join(File.dirname(__FILE__), '..', 'bin', 'git-media') in_temp_git do append_file('testing1.mov', '1234567') append_file('testing2.mov', '123456789') append_file('normal.txt', 'hello world') append_file('.gitattributes', '*.mov filter=media') `git config filter.media.clean "#{bin} clean"` `git config filter.media.smudge "#{bin} smudge"` yield end end def append_file(filename, content) File.open(filename, 'w+') do |f| f.print content end end def git(command) output = `git #{command} 2>/dev/null`.strip end