Repository: kjerk/sjisunzip
Branch: master
Commit: 84a4a1e29edc
Files: 15
Total size: 28.7 KB
Directory structure:
gitextract_f_bkyp2k/
├── .gitignore
├── LICENSE
├── README.md
├── SjisUnzip/
│ ├── App.config
│ ├── AppRunner.cs
│ ├── ExtensionMethods.cs
│ ├── Properties/
│ │ └── AssemblyInfo.cs
│ ├── SjisUnzip.csproj
│ └── SjisUnzipApp.cs
├── SjisUnzip.sln
└── SjisUnzipTests/
├── ExtensionTests.cs
├── Properties/
│ └── AssemblyInfo.cs
├── SjisUnzipAppTests.cs
├── SjisUnzipTests.csproj
└── packages.config
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.
# User-specific files
*.suo
*.user
*.userosscache
*.sln.docstates
# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
build/
bld/
[Bb]in/
[Oo]bj/
# Roslyn cache directories
*.ide/
# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*
#NUNIT
*.VisualState.xml
TestResult.xml
# Build Results of an ATL Project
[Dd]ebugPS/
[Rr]eleasePS/
dlldata.c
*_i.c
*_p.c
*_i.h
*.ilk
*.meta
*.obj
*.pch
*.pdb
*.pgc
*.pgd
*.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*.log
*.vspscc
*.vssscc
.builds
*.pidb
*.svclog
*.scc
# Chutzpah Test files
_Chutzpah*
# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opensdf
*.sdf
*.cachefile
# Visual Studio profiler
*.psess
*.vsp
*.vspx
# TFS 2012 Local Workspace
$tf/
# Guidance Automation Toolkit
*.gpState
# ReSharper is a .NET coding add-in
_ReSharper*/
*.[Rr]e[Ss]harper
*.DotSettings.user
# JustCode is a .NET coding addin-in
.JustCode
# TeamCity is a build add-in
_TeamCity*
# DotCover is a Code Coverage Tool
*.dotCover
# NCrunch
_NCrunch_*
.*crunch*.local.xml
# MightyMoose
*.mm.*
AutoTest.Net/
# Web workbench (sass)
.sass-cache/
# Installshield output folder
[Ee]xpress/
# DocProject is a documentation generator add-in
DocProject/buildhelp/
DocProject/Help/*.HxT
DocProject/Help/*.HxC
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/Html2
DocProject/Help/html
# Click-Once directory
publish/
# Publish Web Output
*.[Pp]ublish.xml
*.azurePubxml
# TODO: Comment the next line if you want to checkin your web deploy settings
# but database connection strings (with potential passwords) will be unencrypted
*.pubxml
*.publishproj
# NuGet Packages
*.nupkg
# The packages folder can be ignored because of Package Restore
**/packages/*
# except build/, which is used as an MSBuild target.
!**/packages/build/
# If using the old MSBuild-Integrated Package Restore, uncomment this:
#!**/packages/repositories.config
# Windows Azure Build Output
csx/
*.build.csdef
# Windows Store app package directory
AppPackages/
# Others
sql/
*.Cache
ClientBin/
[Ss]tyle[Cc]op.*
~$*
*~
*.dbmdl
*.dbproj.schemaview
*.pfx
*.publishsettings
node_modules/
bower_components/
# RIA/Silverlight projects
Generated_Code/
# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
# SQL Server files
*.mdf
*.ldf
# Business Intelligence projects
*.rdl.data
*.bim.layout
*.bim_*.settings
# Microsoft Fakes
FakesAssemblies/
================================================
FILE: LICENSE
================================================
The MIT License (MIT)
Copyright (c) 2014 https://github.com/kjerk
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
sjisunzip
=========
This is a pretty braindead command line utility that simply forces the encoding to the right values to extract a Shift JIS encoded zip file ('[Code page 932](http://en.wikipedia.org/wiki/Code_page_932)') on a western/ansi encoding system.
[Download Here](https://github.com/kjerk/sjisunzip/releases)
```
Usage:
sjisunzip someFile.zip [toFolder]
sjisunzip [-r] someFile.zip
-r: Recode file to {filename}_utf8.zip
Examples:
sjisunzip aFile.zip
sjisunzip aFile.zip MyNewFolder
```
You can also just drop a zip file onto the program since that'll pass it as the first argument and the contents will be extracted in the same directory.
If you've ever received a zip file from a friend, or the wrong damn gnu mirror or whatever that passed through Japan then you've probably seen garbled filenames

---
Well this program forces the opened zip to the correct encoding then extracts the file to a more reasonable UTF encoding.

---
You can even just reencode the zip file to a less busted-ass one so you don't have this creeping horror issue in the future

---
The filenames and paths should be untangled when done.

---
Bonus fact: When this type of transitive corruption occurs, the output characters are called [Mojibake](http://en.wikipedia.org/wiki/Mojibake). That's almost cute enough to not be awful anymore.
================================================
FILE: SjisUnzip/App.config
================================================
================================================
FILE: SjisUnzip/AppRunner.cs
================================================
namespace SjisUnzip
{
///
/// Bootstrapper to get out of the static context.
///
static class AppRunner
{
static void Main(string[] args)
{
var mainApp = new SjisUnzipApp();
mainApp.Main(args);
}
}
}
================================================
FILE: SjisUnzip/ExtensionMethods.cs
================================================
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace SjisUnzip
{
public static class ExtensionMethods
{
//
// System.String
//
///
/// Checks whether a string contains any characters outside the typical ascii range. 127 or higher.
///
/// A string dum dum.
/// Boolean flag
public static bool ContainsNonAscii(this string str)
{
// http://www.asciitable.com/
return str.Any(t => t > 0x7E);
}
///
/// A function to do a raw reinterpretation of character encoding without using the .Convert() function.
/// This allows accidentally transposed strings to have their encoding fixed by reversing the process.
///
/// The string being operated on.
/// The source encoding the bytes will be extracted as.
/// The destination encoding the bytes will be 'interpreted' as.
/// The reinterpreted string.
public static string RawTranscode(this string str, Encoding from, Encoding to)
{
var rawBytes = from.GetBytes(str);
return to.GetString(rawBytes);
}
///
/// A wrapper for the RawTranscode() function which passes the correct parameters for fixing Sjis mishaps.
///
///
/// The garbled string.
/// A fixed unicode string.
public static string DecodeMojibake(this string str)
{
return str.RawTranscode(Encoding.Default, Encoding.GetEncoding(932));
}
///
/// Checks the contents of a string to see if any are within CJK and a few other unicode ranges.
///
///
/// See Reference Document for ranges.
///
/// Any string.
/// Boolean flag if any characters found.
public static bool ContainsJapanese(this string str)
{
return str.Any(c =>
// Punctuation, Hiragana, Katakana
(c > 0x3000 && c < 0x30ff) ||
// Romaji and half-width kana
(c > 0xff00 && c < 0xffef) ||
// CJK Extension
(c > 0x3400 && c < 0x4dbf) ||
// CJK
(c > 0x4e00 && c < 0x9fff)
);
}
///
/// Convenience method calling Console.WriteLine. Looks cool too.
///
///
/// "Hello World".wl();
///
public static void wl(this string str)
{
Console.WriteLine(str);
}
///
/// Convenience method for calling Console.WriteLine with params.
///
/// The input format string.
/// Args for populating the format string.
///
/// "Elapsed time was {0} seconds.".wl(timer.getSeconds());
///
public static void wl(this string str, params object[] o)
{
Console.WriteLine(str, o);
}
//
// System.IO.FileInfo
//
///
/// Renames a file while keeping it in the same directory. The typical MoveTo is considered absolute,
/// whereas this is considered relative.
///
/// The operating fileInfo.
/// A new filename.
public static void Rename(this FileInfo fi, string newName)
{
fi.MoveTo(Path.Combine(fi.Directory.FullName, newName));
}
///
/// Renames a file in place and keeps the original extension regardless of what the new name is.
///
/// The operating fileInfo.
/// A new filename which will have the original extension appended onto.
public static void RenameKeepExt(this FileInfo fi, string newName)
{
fi.MoveTo(Path.Combine(fi.Directory.FullName, newName) + Path.GetExtension(fi.FullName));
}
//
// System.IO.DirectoryInfo
//
///
/// Does an in-place rename of a directory. Path modifications will not work properly (e.g. ../myname).
///
/// The operating DirectoryInfo
/// A new name for the directory.
public static void Rename(this DirectoryInfo di, string newName)
{
di.MoveTo(Path.Combine(di.Parent.FullName, newName));
}
}
}
================================================
FILE: SjisUnzip/Properties/AssemblyInfo.cs
================================================
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
[assembly: AssemblyTitle("SjisUnzip")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("")]
[assembly: AssemblyProduct("SjisUnzip")]
[assembly: AssemblyCopyright("Copyright © 2014")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
// Setting ComVisible to false makes the types in this assembly not visible
// to COM components. If you need to access a type in this assembly from
// COM, set the ComVisible attribute to true on that type.
[assembly: ComVisible(false)]
// The following GUID is for the ID of the typelib if this project is exposed to COM
[assembly: Guid("79ffa4e3-8a00-426a-8179-2b033731cb9e")]
// Version information for an assembly consists of the following four values:
//
// Major Version
// Minor Version
// Build Number
// Revision
//
// You can specify all the values or you can default the Build and Revision Numbers
// by using the '*' as shown below:
// [assembly: AssemblyVersion("1.0.*")]
[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
================================================
FILE: SjisUnzip/SjisUnzip.csproj
================================================
Debug
AnyCPU
{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}
Exe
Properties
SjisUnzip
sjisunzip
v4.5
512
true
AnyCPU
true
full
false
bin\Debug\
DEBUG;TRACE
prompt
4
false
AnyCPU
pdbonly
true
bin\Release\
TRACE
prompt
4
false
SjisUnzip.AppRunner
================================================
FILE: SjisUnzip/SjisUnzipApp.cs
================================================
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO.Compression;
namespace SjisUnzip
{
public class SjisUnzipApp
{
private readonly Encoding sjisEncoding = Encoding.GetEncoding(932);
public void Main(string[] args)
{
var recode = args.Any((s) => s.Equals("-r"));
if (recode && args.Length == 2)
{
args = args.Where((arg) => arg != "-r").ToArray();
if (args.Length > 0 && File.Exists(args[0]))
{
recodeFile(args[0]);
}
}
else if (args.Length == 1 && Directory.Exists(args[0]))
{
recodeCorruptFilenames(args[0], true);
}
else if (args.Length == 1 && File.Exists(args[0]) && args[0].EndsWith(".zip", true, CultureInfo.CurrentCulture))
{
extractSjisZip(args[0]);
}
else if (args.Length == 2 && File.Exists(args[0]) && args[0].EndsWith(".zip", true, CultureInfo.CurrentCulture))
{
var folderPath = Path.GetDirectoryName(args[0]);
var newFolderPath = Path.Combine(folderPath, args[1]);
Directory.CreateDirectory(newFolderPath);
extractSjisZip(args[0], newFolderPath);
}
else
{
printUsage();
}
}
static void printUsage()
{
"Usage: sjisunzip someFile.zip [toFolder]".wl();
"Usage: sjisunzip [-r] someFile.zip".wl();
" -r: Recode file to {filename}_utf8.zip".wl();
"Usage: sjisunzip ./some_folder_with_corrupt_filenames".wl();
"Examples:".wl();
" sjisunzip aFile.zip".wl();
" sjisunzip aFile.zip MyNewFolder".wl();
}
private void extractSjisZip(string fileName, string toFolder = "./")
{
"Writing to folder {0}...".wl(toFolder);
using (var zipFile = new ZipArchive(new FileStream(fileName, FileMode.Open, FileAccess.Read),
ZipArchiveMode.Read, false, Encoding.GetEncoding(932)))
{
zipFile.ExtractToDirectory(toFolder);
}
"Done.".wl();
}
private void recodeFile(string srcFile)
{
var zipFile = new ZipArchive(new FileStream(srcFile, FileMode.Open), ZipArchiveMode.Read, false, sjisEncoding);
var newFilePath = Path.Combine(Path.GetDirectoryName(srcFile), Path.GetFileNameWithoutExtension(srcFile) + "_utf8.zip");
using (var newZip = new ZipArchive(new FileStream(newFilePath, FileMode.CreateNew), ZipArchiveMode.Create, false, Encoding.UTF8))
{
foreach (var zipEntry in zipFile.Entries)
{
var newFile = newZip.CreateEntry(zipEntry.FullName, CompressionLevel.Fastest);
newFile.LastWriteTime = zipEntry.LastWriteTime;
using (Stream newStream = newFile.Open(), oldStream = zipEntry.Open())
{
"Moved {0}".wl(newFile.FullName);
oldStream.CopyTo(newStream);
}
}
}
"Finished recoding {0} entries.".wl(zipFile.Entries.Count);
}
readonly Func dirSeparatorComparator = c => c == Path.DirectorySeparatorChar;
private void recodeCorruptFilenames(string directoryPath, bool recurse)
{
var rootDir = new DirectoryInfo(directoryPath);
var dirs = rootDir.GetDirectories("*", recurse ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly).ToList();
var files = rootDir.GetFiles("*", recurse ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
files.Where(fi => fi.Name.ContainsNonAscii() && !fi.Name.ContainsJapanese())
.ToList()
.ForEach(
fi => fi.Rename(fi.Name.DecodeMojibake())
);
// Sort reversed based on directory depth, rename deepest first and this won't rename a root before a leaf.
dirs.Sort((d2, d1) => d1.FullName.Count(dirSeparatorComparator).CompareTo(d2.FullName.Count(dirSeparatorComparator)));
dirs.Where(di => di.Name.ContainsNonAscii() && !di.Name.ContainsJapanese())
.ToList()
.ForEach(
di => di.Rename(di.Name.DecodeMojibake())
);
if (rootDir.Name.ContainsNonAscii() && !rootDir.Name.ContainsJapanese())
{
rootDir.Rename(rootDir.Name.DecodeMojibake());
}
}
}
}
================================================
FILE: SjisUnzip.sln
================================================
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2013
VisualStudioVersion = 12.0.31101.0
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SjisUnzip", "SjisUnzip\SjisUnzip.csproj", "{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SjisUnzipTests", "SjisUnzipTests\SjisUnzipTests.csproj", "{0076D69F-56C1-4F2E-9ABE-0FB187824D37}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Debug|Any CPU.Build.0 = Debug|Any CPU
{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Release|Any CPU.ActiveCfg = Release|Any CPU
{1AAE9514-FA5C-44AD-86BF-B2F44466D6B8}.Release|Any CPU.Build.0 = Release|Any CPU
{0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Debug|Any CPU.Build.0 = Debug|Any CPU
{0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Release|Any CPU.ActiveCfg = Release|Any CPU
{0076D69F-56C1-4F2E-9ABE-0FB187824D37}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
================================================
FILE: SjisUnzipTests/ExtensionTests.cs
================================================
using System;
using System.Globalization;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using SjisUnzip;
namespace SjisUnzipTests
{
[TestClass]
public class ExtensionTests
{
private readonly Encoding sjisEncoding = Encoding.GetEncoding(932);
private readonly string textGarbled = "iƒeƒLƒXƒgƒtƒ@ƒCƒ‹j"; // "(テキストファイル)"
private readonly string textCorrect = "(テキストファイル)"; // "(TextFile)"
private readonly string textAscii = "\"One bad programmer can easily create two new jobs a year.\" - David Parnas";
[TestMethod]
public void TestContainsNonAscii()
{
var res = textGarbled.ContainsNonAscii();
Assert.IsTrue(res);
res = textCorrect.ContainsNonAscii();
Assert.IsTrue(res);
res = textAscii.ContainsNonAscii();
Assert.IsFalse(res);
}
[TestMethod]
public void TestRawTranscode()
{
var degarbled = textGarbled.RawTranscode(Encoding.Default, Encoding.GetEncoding(932));
Assert.AreEqual(textCorrect, degarbled, "Degarbled text should be equivalent to the uncorrupted original.");
var engarbled = textCorrect.RawTranscode(Encoding.GetEncoding(932), Encoding.Default);
Assert.AreEqual(textGarbled, engarbled, "Reversing the garbling process on correct text should match the example garbled version.");
}
[TestMethod]
public void TestDecodeMojibake()
{
var degarbled = textGarbled.DecodeMojibake();
Assert.AreEqual(textCorrect, degarbled, "Degarbled text should be equivalent to the uncorrupted original.");
}
[TestMethod]
public void TestContainsJapanese()
{
var res = textAscii.ContainsJapanese();
Assert.IsFalse(res, "Plain ascii strings should obviously not trigger true on this function.");
res = textGarbled.ContainsJapanese();
Assert.IsFalse(res);
res = textCorrect.ContainsJapanese();
Assert.IsTrue(res);
}
}
}
================================================
FILE: SjisUnzipTests/Properties/AssemblyInfo.cs
================================================
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
[assembly: AssemblyTitle("SjisUnzipTests")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("")]
[assembly: AssemblyProduct("SjisUnzipTests")]
[assembly: AssemblyCopyright("Copyright © 2014")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
// Setting ComVisible to false makes the types in this assembly not visible
// to COM components. If you need to access a type in this assembly from
// COM, set the ComVisible attribute to true on that type.
[assembly: ComVisible(false)]
// The following GUID is for the ID of the typelib if this project is exposed to COM
[assembly: Guid("bc5c23de-d2c0-4ce8-a73c-7844b3af0ff5")]
// Version information for an assembly consists of the following four values:
//
// Major Version
// Minor Version
// Build Number
// Revision
//
// You can specify all the values or you can default the Build and Revision Numbers
// by using the '*' as shown below:
// [assembly: AssemblyVersion("1.0.*")]
[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
================================================
FILE: SjisUnzipTests/SjisUnzipAppTests.cs
================================================
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Moq;
using SjisUnzip;
namespace SjisUnzipTests
{
[TestClass]
public class SjisUnzipAppTests
{
[Ignore]
[TestMethod]
public void TestRunWithFile()
{
// TODO: Refactor SjisUnzipApp to be more unit testable.
// TODO: Try to not make a unit test just so you put a todo in it to shame yourself.
}
}
}
================================================
FILE: SjisUnzipTests/SjisUnzipTests.csproj
================================================
Debug
AnyCPU
{0076D69F-56C1-4F2E-9ABE-0FB187824D37}
Library
Properties
SjisUnzipTests
SjisUnzipTests
v4.5
512
{3AC096D0-A1C2-E12C-1390-A8335801FDAB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}
10.0
$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)
$(ProgramFiles)\Common Files\microsoft shared\VSTT\$(VisualStudioVersion)\UITestExtensionPackages
False
UnitTest
true
full
false
bin\Debug\
DEBUG;TRACE
prompt
4
pdbonly
true
bin\Release\
TRACE
prompt
4
..\packages\Moq.4.2.1409.1722\lib\net40\Moq.dll
{1aae9514-fa5c-44ad-86bf-b2f44466d6b8}
SjisUnzip
False
False
False
False
================================================
FILE: SjisUnzipTests/packages.config
================================================