Scallion lets you create vanity GPG keys and .onion addresses (for Tor’s hidden services) using OpenCL. It runs on Mono (tested in Arch Linux) and .NET 3.5+ (tested on Windows 7 and Server 2008).
It is currently in beta stage and under active development. Nevertheless, we feel that it is ready for use. Improvements are expected primarily in performance, user interface, and ease of installation, not in the overall algorithm used to generate keys.
FAQ
Here are some frequently asked questions and their answers:
- Why generate GPG keys? Scallion was used to find collisions for every 32bit key id in the Web of Trust’s strong set demonstrating how insecure 32bit key ids are. There was/is a talk at DEFCON (video) and additional info can be found at https://evil32.com/.
- What are valid characters? Tor .onion addresses use Base32, consisting of all letters and the digits 2 through 7, inclusive. They are case-insensitive. GPG fingerprints use hexadecimal, consisting of the digits 0-9 and the letters A-F.
- Can you use Bitcoin ASICs (e.g. Jalapeno, KnC) to accelerate this process? Sadly, no. While the process Scallion uses is conceptually similar (increment a nonce and check the hash), the details are different (SHA-1 vs double SHA-256 for Bitcoin). Furthermore, Bitcoin ASICs are as fast as they are because they are extremely tailored to Bitcoin mining applications. For example, here’s the datasheet for the CoinCraft A-1, an ASIC that never came out, but is probably indicitive of the general approach. The microcontroller sends work in the form of the final 128-bits of a Bitcoin block, the hash midstate of the previous bits, a target difficulty, and the maximum nonce to try. The ASIC chooses the location to insert the nonce, and it chooses what blocks meet the hash. Scallion has to insert the nonce in a different location, and it checks for a pattern match rather than just “lower than XXXX”.
- How can you use multiple devices? Run multiple Scallion instances. ? Scallion searches are probabilistic, so you won’t be repeating work with the second device. True multi-device support wouldn’t be too difficult, but it also wouldn’t add much. I’ve run several scallion instances in tmux or screen with great success. You’ll just need to manually abort all the jobs when one finds a pattern (or write a shell script to monitor the output file and kill them all when it sees results).
Dependencies
- OpenCL and relevant drivers installed and configured. Refer to your distribution’s documentation.
- OpenSSL. For Windows, the prebuilt x86 DLLs are included
- On windows only, VC++ Redistributable 2008
Build Linux
Prerequisites
- Get the latest mono for your linux distribution: http://www.mono-project.com/download/
- Install Common dependencies:
sudo apt-get update sudo apt-get install libssl-dev mono-devel
- AMD/OpenSource build
sudo apt-get install ocl-icd-opencl-dev
- Nvidia build
sudo apt-get install nvidia-opencl-dev nvidia-opencl-icd
- Finally
msbuild scallion.sln
Docker Linux (nvidia GPUs only)
- Have the nvidia-docker container runtime
- Build the container:
docker build -t scallion -f Dockerfile.nvidia .
- Run:
docker run --runtime=nvidia -ti --rm scallion -l
screenshot of expected output
Build Windows
- Open ‘scallion.sln’ in VS Express for Desktop 2012
- Build the solution, I did everything in debug mode.
Multipattern Hashing
Scallion supports finding one or more of multiple patterns through a primitive regex syntax. Only character classes (ex. [abcd]
) are supported. The .
character represents any character. Onion addresses are always 16 characters long and GPG fingerprints are always 40 characters. You can find a suffix by putting $
at the end of the match (ex. DEAD$
). Finally, the pipe syntax (ex. pattern1|pattern2
) can be used to find multiple patterns. Searching for multible patterns (within reason) will NOT produce a significant decrease in speed. Many regexps will produce a single pattern on the GPU and result in no speed reduction.
Some use cases with examples:
- Generate a prefix followed by a number for better readability:
mono scallion.exe prefix[234567]
- Search for several patterns at once (n.b. -c causes scallion to continue generating even once it gets a hit)
mono scallion.exe -c prefix scallion hashes mono scallion.exe -c "prefix|scallion|hashes"
- Search for the suffix “badbeef”
mono scallion.exe .........badbeef mono scallion.exe --gpg badbeef$ # Generate GPG key
- Complicated self explanatory example:
mono scallion.exe "suffixa$|suffixb$|prefixa|prefixb|a.suffix$|a.test.$"
How Does It work?
At a high level Scallion works as follows:
- Generate RSA key using OpenSSL on the CPU
- Send the key to the GPU
- Increase the key’s public exponent
- Hash the key
- If the hashed key is not a partial collision go to step 3
- If the key does not pass the sanity checks recommended by PKCS #1 v2.1 (checked on the CPU) go to step 3
- Brand new key with partial collision!
The basic algorithm is described above. Speed / performance is the result of massive parallelization, both on the GPU and the CPU.
Speed / Performance
It is important to realize that Scallion preforms a probabilistic search. Actual times may very significantly from predicated
The inital RSA key generation is done the CPU. An ivybridge i7 can generate 51 keys per second using a single core. Each key can provide 1 gigahash worth of exponents to mine and a decent CPU can keep up with several GPUs as it is currently implemented.
SHA1 hashing is done on the GPU. The hashrates for several GPUs we have tested are below (grouped by manufacturer and sorted by power):
GPU | Speed |
---|---|
Intel i7-2620M | 9.9 MH/s |
Intel i5-5200U | 118 MH/s |
NVIDIA GT 520 | 38.7 MH/s |
NVIDIA Quadro K2000M | 90 MH/s |
NVIDIA GTS 250 | 128 MH/s |
NVIDIA GTS 450 | 144 MH/s |
NVIDIA GTX 670 | 480 MH/s |
NVIDIA GTX 970 | 2350 MH/s |
NVIDIA GTX 980 | 3260 MH/s |
NVIDIA GTX 1050 (M) | 1400 MH/s |
NVIDIA GTX 1070 | 4140 MH/s |
NVIDIA GTX 1070 TI | 5100 MH/s |
NVIDIA GTX TITAN X | 4412 MH/s |
NVIDIA GTX 1080 | 5760 MH/s |
NVIDIA Tesla V100 | 11646 MH/s |
AMD A8-7600 APU | 120 MH/s |
AMD Radeon HD5770 | 520 MH/s |
AMD Radeon HD6850 | 600 MH/s |
AMD Radeon RX 460 | 840 MH/s |
AMD Radeon RX 470 | 957 MH/s |
AMD Radeon R9 380X | 2058 MH/s |
AMD FirePro W9100 | 2566 MH/s |
AMD Radeon RX 480 | 2700 MH/s |
AMD Radeon RX 580 | 3180 MH/s |
AMD Radeon R9 Nano | 3325 MH/s |
AMD Vega Frontier Edition | 7119 MH/s |
MH/s = million hashes per second
Its worth noting that Intel has released OpenCL drivers for its processors and short collisions can be found on the CPU.
To calculate the number of seconds required for a given partial collision (on average), use the formula:
Type | Estimated time |
---|---|
GPG Key | 2^(4*length-1) / hashspeed |
.onion Address | 2^(5*length-1) / hashspeed |
For example on my nVidia Quadro K2000M, I see around 90 MH/s. With those speed I can generate an eight character .onion prefix in about 1h 41m, 2^(5*8-1)/90 million = 101 minutes
.
Workgroup Size
Scallion will use your devices reported preferred work group size by default. This is a reasonable default but experimenting with the workgroup may increase performance.
Security
The keys generated by Scallion are quite similar to those generated by shallot. They have unusually large public exponents, but they are put through the full set of sanity checks recommended by PKCS #1 v2.1 via openssl’s RSA_check_key function. Scallion supports several RSA key sizes, with optimized kernels for 1024b, 2048b, and 4096b. Other key sizes may work, but have not been tested.