Repository: kjerk/sjisunzip Branch: master Commit: 84a4a1e29edc Files: 15 Total size: 28.7 KB Directory structure: gitextract_f_bkyp2k/ ├── .gitignore ├── LICENSE ├── README.md ├── SjisUnzip/ │ ├── App.config │ ├── AppRunner.cs │ ├── ExtensionMethods.cs │ ├── Properties/ │ │ └── AssemblyInfo.cs │ ├── SjisUnzip.csproj │ └── SjisUnzipApp.cs ├── SjisUnzip.sln └── SjisUnzipTests/ ├── ExtensionTests.cs ├── Properties/ │ └── AssemblyInfo.cs ├── SjisUnzipAppTests.cs ├── SjisUnzipTests.csproj └── packages.config ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ ## Ignore Visual Studio temporary files, build results, and ## files generated by popular Visual Studio add-ons. # User-specific files *.suo *.user *.userosscache *.sln.docstates # Build results [Dd]ebug/ [Dd]ebugPublic/ [Rr]elease/ [Rr]eleases/ x64/ x86/ build/ bld/ [Bb]in/ [Oo]bj/ # Roslyn cache directories *.ide/ # MSTest test Results [Tt]est[Rr]esult*/ [Bb]uild[Ll]og.* #NUNIT *.VisualState.xml TestResult.xml # Build Results of an ATL Project [Dd]ebugPS/ [Rr]eleasePS/ dlldata.c *_i.c *_p.c *_i.h *.ilk *.meta *.obj *.pch *.pdb *.pgc *.pgd *.rsp *.sbr *.tlb *.tli *.tlh *.tmp *.tmp_proj *.log *.vspscc *.vssscc .builds *.pidb *.svclog *.scc # Chutzpah Test files _Chutzpah* # Visual C++ cache files ipch/ *.aps *.ncb *.opensdf *.sdf *.cachefile # Visual Studio profiler *.psess *.vsp *.vspx # TFS 2012 Local Workspace $tf/ # Guidance Automation Toolkit *.gpState # ReSharper is a .NET coding add-in _ReSharper*/ *.[Rr]e[Ss]harper *.DotSettings.user # JustCode is a .NET coding addin-in .JustCode # TeamCity is a build add-in _TeamCity* # DotCover is a Code Coverage Tool *.dotCover # NCrunch _NCrunch_* .*crunch*.local.xml # MightyMoose *.mm.* AutoTest.Net/ # Web workbench (sass) .sass-cache/ # Installshield output folder [Ee]xpress/ # DocProject is a documentation generator add-in DocProject/buildhelp/ DocProject/Help/*.HxT DocProject/Help/*.HxC DocProject/Help/*.hhc DocProject/Help/*.hhk DocProject/Help/*.hhp DocProject/Help/Html2 DocProject/Help/html # Click-Once directory publish/ # Publish Web Output *.[Pp]ublish.xml *.azurePubxml # TODO: Comment the next line if you want to checkin your web deploy settings # but database connection strings (with potential passwords) will be unencrypted *.pubxml *.publishproj # NuGet Packages *.nupkg # The packages folder can be ignored because of Package Restore **/packages/* # except build/, which is used as an MSBuild target. !**/packages/build/ # If using the old MSBuild-Integrated Package Restore, uncomment this: #!**/packages/repositories.config # Windows Azure Build Output csx/ *.build.csdef # Windows Store app package directory AppPackages/ # Others sql/ *.Cache ClientBin/ [Ss]tyle[Cc]op.* ~$* *~ *.dbmdl *.dbproj.schemaview *.pfx *.publishsettings node_modules/ bower_components/ # RIA/Silverlight projects Generated_Code/ # Backup & report files from converting an old project file # to a newer Visual Studio version. Backup files are not needed, # because we have git ;-) _UpgradeReport_Files/ Backup*/ UpgradeLog*.XML UpgradeLog*.htm # SQL Server files *.mdf *.ldf # Business Intelligence projects *.rdl.data *.bim.layout *.bim_*.settings # Microsoft Fakes FakesAssemblies/ ================================================ FILE: LICENSE ================================================ The MIT License (MIT) Copyright (c) 2014 https://github.com/kjerk Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ sjisunzip ========= This is a pretty braindead command line utility that simply forces the encoding to the right values to extract a Shift JIS encoded zip file ('[Code page 932](http://en.wikipedia.org/wiki/Code_page_932)') on a western/ansi encoding system. [Download Here](https://github.com/kjerk/sjisunzip/releases) ``` Usage: sjisunzip someFile.zip [toFolder] sjisunzip [-r] someFile.zip -r: Recode file to {filename}_utf8.zip Examples: sjisunzip aFile.zip sjisunzip aFile.zip MyNewFolder ``` You can also just drop a zip file onto the program since that'll pass it as the first argument and the contents will be extracted in the same directory. If you've ever received a zip file from a friend, or the wrong damn gnu mirror or whatever that passed through Japan then you've probably seen garbled filenames ![example_1](https://cloud.githubusercontent.com/assets/2738686/5326938/37acc0de-7ce7-11e4-8259-06ef8b1f43a8.jpg) --- Well this program forces the opened zip to the correct encoding then extracts the file to a more reasonable UTF encoding. ![example_2](https://cloud.githubusercontent.com/assets/2738686/5326978/712d7e50-7ce9-11e4-8f18-c885afc51055.jpg) --- You can even just reencode the zip file to a less busted-ass one so you don't have this creeping horror issue in the future ![example_3](https://cloud.githubusercontent.com/assets/2738686/5326937/37ab2878-7ce7-11e4-9655-61b92a2b680d.jpg) --- The filenames and paths should be untangled when done. ![example_4](https://cloud.githubusercontent.com/assets/2738686/5326940/37af9d72-7ce7-11e4-8ee2-3a9d11c6e669.jpg) --- Bonus fact: When this type of transitive corruption occurs, the output characters are called [Mojibake](http://en.wikipedia.org/wiki/Mojibake). That's almost cute enough to not be awful anymore. ================================================ FILE: SjisUnzip/App.config ================================================ ================================================ FILE: SjisUnzip/AppRunner.cs ================================================ namespace SjisUnzip { /// /// Bootstrapper to get out of the static context. /// static class AppRunner { static void Main(string[] args) { var mainApp = new SjisUnzipApp(); mainApp.Main(args); } } } ================================================ FILE: SjisUnzip/ExtensionMethods.cs ================================================ using System; using System.Collections.Generic; using System.Globalization; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; namespace SjisUnzip { public static class ExtensionMethods { // // System.String // /// /// Checks whether a string contains any characters outside the typical ascii range. 127 or higher. /// /// A string dum dum. /// Boolean flag public static bool ContainsNonAscii(this string str) { // http://www.asciitable.com/ return str.Any(t => t > 0x7E); } /// /// A function to do a raw reinterpretation of character encoding without using the .Convert() function. /// This allows accidentally transposed strings to have their encoding fixed by reversing the process. /// /// The string being operated on. /// The source encoding the bytes will be extracted as. /// The destination encoding the bytes will be 'interpreted' as. /// The reinterpreted string. public static string RawTranscode(this string str, Encoding from, Encoding to) { var rawBytes = from.GetBytes(str); return to.GetString(rawBytes); } /// /// A wrapper for the RawTranscode() function which passes the correct parameters for fixing Sjis mishaps. /// /// /// The garbled string. /// A fixed unicode string. public static string DecodeMojibake(this string str) { return str.RawTranscode(Encoding.Default, Encoding.GetEncoding(932)); } /// /// Checks the contents of a string to see if any are within CJK and a few other unicode ranges. /// /// /// See Reference Document for ranges. /// /// Any string. /// Boolean flag if any characters found. public static bool ContainsJapanese(this string str) { return str.Any(c => // Punctuation, Hiragana, Katakana (c > 0x3000 && c < 0x30ff) || // Romaji and half-width kana (c > 0xff00 && c < 0xffef) || // CJK Extension (c > 0x3400 && c < 0x4dbf) || // CJK (c > 0x4e00 && c < 0x9fff) ); } /// /// Convenience method calling Console.WriteLine. Looks cool too. /// /// /// "Hello World".wl(); /// public static void wl(this string str) { Console.WriteLine(str); } /// /// Convenience method for calling Console.WriteLine with params. /// /// The input format string. /// Args for populating the format string. /// /// "Elapsed time was {0} seconds.".wl(timer.getSeconds()); /// public static void wl(this string str, params object[] o) { Console.WriteLine(str, o); } // // System.IO.FileInfo // /// /// Renames a file while keeping it in the same directory. The typical MoveTo is considered absolute, /// whereas this is considered relative. /// /// The operating fileInfo. /// A new filename. public static void Rename(this FileInfo fi, string newName) { fi.MoveTo(Path.Combine(fi.Directory.FullName, newName)); } /// /// Renames a file in place and keeps the original extension regardless of what the new name is. /// /// The operating fileInfo. /// A new filename which will have the original extension appended onto. public static void RenameKeepExt(this FileInfo fi, string newName) { fi.MoveTo(Path.Combine(fi.Directory.FullName, newName) + Path.GetExtension(fi.FullName)); } // // System.IO.DirectoryInfo // /// /// Does an in-place rename of a directory. Path modifications will not work properly (e.g. ../myname). /// /// The operating DirectoryInfo /// A new name for the directory. public static void Rename(this DirectoryInfo di, string newName) { di.MoveTo(Path.Combine(di.Parent.FullName, newName)); } } } ================================================ FILE: SjisUnzip/Properties/AssemblyInfo.cs ================================================ using System.Reflection; using System.Runtime.CompilerServices; using System.Runtime.InteropServices; // General Information about an assembly is controlled through the following // set of attributes. Change these attribute values to modify the information // associated with an assembly. [assembly: AssemblyTitle("SjisUnzip")] [assembly: AssemblyDescription("")] [assembly: AssemblyConfiguration("")] [assembly: AssemblyCompany("")] [assembly: AssemblyProduct("SjisUnzip")] [assembly: AssemblyCopyright("Copyright © 2014")] [assembly: AssemblyTrademark("")] [assembly: AssemblyCulture("")] // Setting ComVisible to false makes the types in this assembly not visible // to COM components. If you need to access a type in this assembly from // COM, set the ComVisible attribute to true on that type. [assembly: ComVisible(false)] // The following GUID is for the ID of the typelib if this project is exposed to COM [assembly: Guid("79ffa4e3-8a00-426a-8179-2b033731cb9e")] // Version information for an assembly consists of the following four values: // // Major Version // Minor Version // Build Number // Revision // // You can specify all the values or you can default the Build and Revision Numbers // by using the '*' as shown below: // [assembly: AssemblyVersion("1.0.*")] [assembly: AssemblyVersion("1.0.0.0")] [assembly: AssemblyFileVersion("1.0.0.0")] ================================================ FILE: SjisUnzip/SjisUnzip.csproj ================================================  Debug AnyCPU {1AAE9514-FA5C-44AD-86BF-B2F44466D6B8} Exe Properties SjisUnzip sjisunzip v4.5 512 true AnyCPU true full false bin\Debug\ DEBUG;TRACE prompt 4 false AnyCPU pdbonly true bin\Release\ TRACE prompt 4 false SjisUnzip.AppRunner ================================================ FILE: SjisUnzip/SjisUnzipApp.cs ================================================ using System; using System.Collections.Generic; using System.Globalization; using System.IO; using System.Linq; using System.Text; using System.Threading.Tasks; using System.IO.Compression; namespace SjisUnzip { public class SjisUnzipApp { private readonly Encoding sjisEncoding = Encoding.GetEncoding(932); public void Main(string[] args) { var recode = args.Any((s) => s.Equals("-r")); if (recode && args.Length == 2) { args = args.Where((arg) => arg != "-r").ToArray(); if (args.Length > 0 && File.Exists(args[0])) { recodeFile(args[0]); } } else if (args.Length == 1 && Directory.Exists(args[0])) { recodeCorruptFilenames(args[0], true); } else if (args.Length == 1 && File.Exists(args[0]) && args[0].EndsWith(".zip", true, CultureInfo.CurrentCulture)) { extractSjisZip(args[0]); } else if (args.Length == 2 && File.Exists(args[0]) && args[0].EndsWith(".zip", true, CultureInfo.CurrentCulture)) { var folderPath = Path.GetDirectoryName(args[0]); var newFolderPath = Path.Combine(folderPath, args[1]); Directory.CreateDirectory(newFolderPath); extractSjisZip(args[0], newFolderPath); } else { printUsage(); } } static void printUsage() { "Usage: sjisunzip someFile.zip [toFolder]".wl(); "Usage: sjisunzip [-r] someFile.zip".wl(); " -r: Recode file to {filename}_utf8.zip".wl(); "Usage: sjisunzip ./some_folder_with_corrupt_filenames".wl(); "Examples:".wl(); " sjisunzip aFile.zip".wl(); " sjisunzip aFile.zip MyNewFolder".wl(); } private void extractSjisZip(string fileName, string toFolder = "./") { "Writing to folder {0}...".wl(toFolder); using (var zipFile = new ZipArchive(new FileStream(fileName, FileMode.Open, FileAccess.Read), ZipArchiveMode.Read, false, Encoding.GetEncoding(932))) { zipFile.ExtractToDirectory(toFolder); } "Done.".wl(); } private void recodeFile(string srcFile) { var zipFile = new ZipArchive(new FileStream(srcFile, FileMode.Open), ZipArchiveMode.Read, false, sjisEncoding); var newFilePath = Path.Combine(Path.GetDirectoryName(srcFile), Path.GetFileNameWithoutExtension(srcFile) + "_utf8.zip"); using (var newZip = new ZipArchive(new FileStream(newFilePath, FileMode.CreateNew), ZipArchiveMode.Create, false, Encoding.UTF8)) { foreach (var zipEntry in zipFile.Entries) { var newFile = newZip.CreateEntry(zipEntry.FullName, CompressionLevel.Fastest); newFile.LastWriteTime = zipEntry.LastWriteTime; using (Stream newStream = newFile.Open(), oldStream = zipEntry.Open()) { "Moved {0}".wl(newFile.FullName); oldStream.CopyTo(newStream); } } } "Finished recoding {0} entries.".wl(zipFile.Entries.Count); } readonly Func dirSeparatorComparator = c => c == Path.DirectorySeparatorChar; private void recodeCorruptFilenames(string directoryPath, bool recurse) { var rootDir = new DirectoryInfo(directoryPath); var dirs = rootDir.GetDirectories("*", recurse ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly).ToList(); var files = rootDir.GetFiles("*", recurse ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly); files.Where(fi => fi.Name.ContainsNonAscii() && !fi.Name.ContainsJapanese()) .ToList() .ForEach( fi => fi.Rename(fi.Name.DecodeMojibake()) ); // Sort reversed based on directory depth, rename deepest first and this won't rename a root before a leaf. dirs.Sort((d2, d1) => d1.FullName.Count(dirSeparatorComparator).CompareTo(d2.FullName.Count(dirSeparatorComparator))); dirs.Where(di => di.Name.ContainsNonAscii() && !di.Name.ContainsJapanese()) .ToList() .ForEach( di => di.Rename(di.Name.DecodeMojibake()) ); if (rootDir.Name.ContainsNonAscii() && !rootDir.Name.ContainsJapanese()) { rootDir.Rename(rootDir.Name.DecodeMojibake()); } } } } ================================================ FILE: SjisUnzip.sln ================================================  Microsoft Visual Studio Solution File, Format Version 12.00 # Visual Studio 2013 VisualStudioVersion = 12.0.31101.0 MinimumVisualStudioVersion = 10.0.40219.1 Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SjisUnzip", "SjisUnzip\SjisUnzip.csproj", "{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}" EndProject Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SjisUnzipTests", "SjisUnzipTests\SjisUnzipTests.csproj", "{0076D69F-56C1-4F2E-9ABE-0FB187824D37}" EndProject Global GlobalSection(SolutionConfigurationPlatforms) = preSolution Debug|Any CPU = Debug|Any CPU Release|Any CPU = Release|Any CPU EndGlobalSection GlobalSection(ProjectConfigurationPlatforms) = postSolution {1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Debug|Any CPU.ActiveCfg = Debug|Any CPU {1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Debug|Any CPU.Build.0 = Debug|Any CPU {1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Release|Any CPU.ActiveCfg = Release|Any CPU {1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Release|Any CPU.Build.0 = Release|Any CPU {0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Debug|Any CPU.ActiveCfg = Debug|Any CPU {0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Debug|Any CPU.Build.0 = Debug|Any CPU {0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Release|Any CPU.ActiveCfg = Release|Any CPU {0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Release|Any CPU.Build.0 = Release|Any CPU EndGlobalSection GlobalSection(SolutionProperties) = preSolution HideSolutionNode = FALSE EndGlobalSection EndGlobal ================================================ FILE: SjisUnzipTests/ExtensionTests.cs ================================================ using System; using System.Globalization; using System.Text; using Microsoft.VisualStudio.TestTools.UnitTesting; using SjisUnzip; namespace SjisUnzipTests { [TestClass] public class ExtensionTests { private readonly Encoding sjisEncoding = Encoding.GetEncoding(932); private readonly string textGarbled = "iƒeƒLƒXƒgƒtƒ@ƒCƒ‹j"; // "(テキストファイル)" private readonly string textCorrect = "(テキストファイル)"; // "(TextFile)" private readonly string textAscii = "\"One bad programmer can easily create two new jobs a year.\" - David Parnas"; [TestMethod] public void TestContainsNonAscii() { var res = textGarbled.ContainsNonAscii(); Assert.IsTrue(res); res = textCorrect.ContainsNonAscii(); Assert.IsTrue(res); res = textAscii.ContainsNonAscii(); Assert.IsFalse(res); } [TestMethod] public void TestRawTranscode() { var degarbled = textGarbled.RawTranscode(Encoding.Default, Encoding.GetEncoding(932)); Assert.AreEqual(textCorrect, degarbled, "Degarbled text should be equivalent to the uncorrupted original."); var engarbled = textCorrect.RawTranscode(Encoding.GetEncoding(932), Encoding.Default); Assert.AreEqual(textGarbled, engarbled, "Reversing the garbling process on correct text should match the example garbled version."); } [TestMethod] public void TestDecodeMojibake() { var degarbled = textGarbled.DecodeMojibake(); Assert.AreEqual(textCorrect, degarbled, "Degarbled text should be equivalent to the uncorrupted original."); } [TestMethod] public void TestContainsJapanese() { var res = textAscii.ContainsJapanese(); Assert.IsFalse(res, "Plain ascii strings should obviously not trigger true on this function."); res = textGarbled.ContainsJapanese(); Assert.IsFalse(res); res = textCorrect.ContainsJapanese(); Assert.IsTrue(res); } } } ================================================ FILE: SjisUnzipTests/Properties/AssemblyInfo.cs ================================================ using System.Reflection; using System.Runtime.CompilerServices; using System.Runtime.InteropServices; // General Information about an assembly is controlled through the following // set of attributes. Change these attribute values to modify the information // associated with an assembly. [assembly: AssemblyTitle("SjisUnzipTests")] [assembly: AssemblyDescription("")] [assembly: AssemblyConfiguration("")] [assembly: AssemblyCompany("")] [assembly: AssemblyProduct("SjisUnzipTests")] [assembly: AssemblyCopyright("Copyright © 2014")] [assembly: AssemblyTrademark("")] [assembly: AssemblyCulture("")] // Setting ComVisible to false makes the types in this assembly not visible // to COM components. If you need to access a type in this assembly from // COM, set the ComVisible attribute to true on that type. [assembly: ComVisible(false)] // The following GUID is for the ID of the typelib if this project is exposed to COM [assembly: Guid("bc5c23de-d2c0-4ce8-a73c-7844b3af0ff5")] // Version information for an assembly consists of the following four values: // // Major Version // Minor Version // Build Number // Revision // // You can specify all the values or you can default the Build and Revision Numbers // by using the '*' as shown below: // [assembly: AssemblyVersion("1.0.*")] [assembly: AssemblyVersion("1.0.0.0")] [assembly: AssemblyFileVersion("1.0.0.0")] ================================================ FILE: SjisUnzipTests/SjisUnzipAppTests.cs ================================================ using Microsoft.VisualStudio.TestTools.UnitTesting; using Moq; using SjisUnzip; namespace SjisUnzipTests { [TestClass] public class SjisUnzipAppTests { [Ignore] [TestMethod] public void TestRunWithFile() { // TODO: Refactor SjisUnzipApp to be more unit testable. // TODO: Try to not make a unit test just so you put a todo in it to shame yourself. } } } ================================================ FILE: SjisUnzipTests/SjisUnzipTests.csproj ================================================  Debug AnyCPU {0076D69F-56C1-4F2E-9ABE-0FB187824D37} Library Properties SjisUnzipTests SjisUnzipTests v4.5 512 {3AC096D0-A1C2-E12C-1390-A8335801FDAB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC} 10.0 $(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion) $(ProgramFiles)\Common Files\microsoft shared\VSTT\$(VisualStudioVersion)\UITestExtensionPackages False UnitTest true full false bin\Debug\ DEBUG;TRACE prompt 4 pdbonly true bin\Release\ TRACE prompt 4 ..\packages\Moq.4.2.1409.1722\lib\net40\Moq.dll {1aae9514-fa5c-44ad-86bf-b2f44466d6b8} SjisUnzip False False False False ================================================ FILE: SjisUnzipTests/packages.config ================================================