坑了好久的同步文件读取

2019-04-20

其他文章

字数统计: 436字 | 阅读时长≈ 1分钟

⌛️本文状态：已完结✔️

¶相关段子

分布式系统中只有两个难题：
2、确保只发送一次
1、保证消息顺序
2、确保只发送一次
——from 微博

一个程序员碰到了一个bug，他决定用多线程来解决，有线在他每个都了一bg个。u程现
——from 微博

¶问题背景

最近在做的项目里，需要python读取某个文件夹下的最新txt文件。这些txt文件是在浏览器实时下载下来的，所以文件夹中会夹杂一些tmp文件。一个挺简单的事儿，还是碰到挺多坑。

起初我用os.path.getmtime(test_report + fn)作为lists排序的key，这里得到的key是包含tmp文件的，但是在lists中我把其中的tmp文件删掉了，所以代码时常会发生找不到tmp的报错。

后来得到下面代码的雏形，只是没有第四行的flag设置。第二天中午突然想到， lists可能经过一次remove之后就退出了（因为这个lists已经改变了），而不是和之前想象中的会遍历整个lists。 下午赶紧试了一下，果然是这样，时不时报bug的代码终于消停了。

¶代码

最后改好的好用的代码为：

def new_report(rootpath):
    file_dict = {}
    lists = os.listdir(rootpath)  # 先获取文件夹内的所有文件
    flag = True
    while flag:
        for l in lists:
            if not l.endswith('.txt'):
                lists.remove(l)
                flag = True
            else:
                flag = False
    time.sleep(0.1)
    for i in lists:  # 遍历所有文件
        ctime = os.stat(os.path.join(rootpath, i)).st_ctime
        file_dict[ctime] = i  # 添加创建时间和文件名到字典
    max_ctime = max(file_dict.keys())  # 取值最大的时间
    # print(file_dict[max_ctime])  # 打印出最新文件名
    return rootpath+file_dict[max_ctime]