简单讨论一下MurmurHash
算法,之后会依此做一个简易的短链接项目,尽量快些做出来,分享出来
既然是初探,我只会去说怎么使用,至于原理,就留到有机会(doge
)再来探究吧
在一周前我对这个算法也是闻所未闻,至到我看了这篇高性能短链设计文章,才有所了解。
关于MurmurHash
算法,可参考Murmurhash介绍与实现
从上面文章知道,MurmurHash
是一种非常高效的加密型哈希算法,随机特征表现的非常好,应用领域也很多,具有较高的平衡性和低碰撞率。
它实现了32
位和128
位HashKey
加密,这里直接使用Google
提供的guava
。
pom依赖
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>30.1-jre</version>
</dependency>
使用如下
创建HashFunction
对象,加密即可
public static void main(String[] args) {
HashFunction hashFunction = Hashing.murmur3_32();
HashCode hashCode = hashFunction.hashString("https://wnhyang.gitee.io/", StandardCharsets.UTF_8);
System.out.println(hashCode);
}
结果如下
a453dd36
Process finished with exit code 0
这里不禁有疑问了?
a453dd36
是什么东西,第一反应便是16
进制数,正好
4
∗
8
=
32
4*8=32
4∗8=32位,很完美
找到HashCode
接口,看到下面几个方法,都能看懂吧!?
@Beta
public abstract class HashCode {
HashCode() {}
/** Returns the number of bits in this hash code; a positive multiple of 8. */
public abstract int bits();
/**
* Returns the first four bytes of {@linkplain #asBytes() this hashcode's bytes}, converted to an
* {@code int} value in little-endian order.
*
* @throws IllegalStateException if {@code bits() < 32}
*/
public abstract int asInt();
/**
* Returns the first eight bytes of {@linkplain #asBytes() this hashcode's bytes}, converted to a
* {@code long} value in little-endian order.
*
* @throws IllegalStateException if {@code bits() < 64}
*/
public abstract long asLong();
/**
* If this hashcode has enough bits, returns {@code asLong()}, otherwise returns a {@code long}
* value with {@code asBytes()} as the least-significant bytes and {@code 0x00} as the remaining
* most-significant bytes.
*
* @since 14.0 (since 11.0 as {@code Hashing.padToLong(HashCode)})
*/
public abstract long padToLong();
/**
* Returns the value of this hash code as a byte array. The caller may modify the byte array;
* changes to it will <i>not</i> be reflected in this {@code HashCode} object or any other arrays
* returned by this method.
*/
// TODO(user): consider ByteString here, when that is available
public abstract byte[] asBytes();
...
}
加上这些方法,重新来
public static void main(String[] args) {
HashFunction hashFunction = Hashing.murmur3_32();
HashCode hashCode = hashFunction.hashString("https://wnhyang.gitee.io/", StandardCharsets.UTF_8);
System.out.print(hashCode.asInt() + " ");
System.out.print(hashCode.padToLong() + " ");
System.out.print(hashCode.asBytes() + " ");
System.out.print(hashCode.bits() + " ");
System.out.println(hashCode);
}
新结果就很舒服了,
2
32
=
4
,
294
,
967
,
296
2^{32}=4,294,967,296
232=4,294,967,296,也就是说,32
位的MurmurHash
算法最多可以有近43
亿,按上面短链接的方法转为62
进制,6
位足矣,
6
2
6
=
56
,
800
,
235
,
584
62^{6}=56,800,235,584
626=56,800,235,584,有568
亿呢。
920474532 920474532 [B@76ed5528 32 a453dd36
Process finished with exit code 0
可是,之后测试了同样的代码,只是把要加密的url
变为https://blog.csdn.net/freda1997/article/details/105199265便出现问题了
。
意料之外的结果
-503738096 3791229200 [B@76ed5528 32 1091f9e1
Process finished with exit code 0
从上面HashCode
接口的方法可知asInt()
和padToLong()
区别就在这,从Java
基础或者说是计算机基础可知数有有符号数和无符号数一说。有符号数第一位表示符号范围为[
−
2
n
−
1
-2^{n-1}
−2n−1,
2
n
−
1
−
1
2^{n-1}-1
2n−1−1],无符号数则没有首位的限制。所以能容易想到是int
数值溢出了。
测试验证
public static void main(String[] args) {
HashFunction hashFunction = Hashing.murmur3_32();
HashCode hashCode = hashFunction.hashString("https://blog.csdn.net/freda1997/article/details/105199265", StandardCharsets.UTF_8);
System.out.print(hashCode.asInt() + " ");
System.out.print(hashCode.padToLong() + " ");
System.out.print(hashCode.asBytes() + " ");
System.out.print(hashCode.bits() + " ");
System.out.println(hashCode);
long d = (int) hashCode.padToLong();
System.out.println(d);
}
果然
-503738096 3791229200 [B@76ed5528 32 1091f9e1
-503738096
Process finished with exit code 0
上面就展示了怎么使用MurmurHash
算法,之后就可着手开始短链接项目了。