如何配合 multiprocessing 使用 hashlib 来计算多种摘要？

讨论未结 2 52

Licsber licsber 会员 2022年4月17日 09:07 发表

<ol> <li>想获取一个大文件(>1GiB)的 md5 、sha1 、sha256 、crc32 、md4 等信息 </li> <li>只想要文件被完整读取一次 </li> </ol> 使用 hashlib 获取摘要的时候明显代码瓶颈在单核 cpu 上 profile 显示主要都在各个 hashobj 的 update()方法耗时最长 我看官方文档里有这么一句话： <pre><code class="language-text">Note For better multithreading performance, the Python GIL is released for data larger than 2047 bytes at object creation or on update. </code></pre> 然而并没有发现实际起作用即 GIL 没有被释放占用率仍然是 100%cpu 验证了不是 io 瓶颈 python3.9.2

相关标签：灌水交流
注意：本文归作者所有，未经作者允许，不得转载

2个回复

LeeReamond 会员

2022年4月17日 09:48

hashlib 是通过 ffi 调用实现的，不需要多进程，直接使用多线程即可释放 GIL ，你说不能释放 GIL 我感觉是你哪里错了。

0 0

Licsber

2022年4月17日 09:48

#1 感谢我刚思考了一会好像理解我哪里想错了我现在用 threading 库实现一下过会贴代码

0 0

如何配合 multiprocessing 使用 hashlib 来计算多种摘要？

友情链接