createtempfile(Python 开发必备：tempfile 模块深度解析)

头条快讯编辑关注加好友

2026-03-02 14:410评论

临时目录就是个生命周期很短的文件夹，专门用来存放那些不需要长期保留的数据。用完之后连同里面的内容一起删掉，文件系统保持干净。

临时目录在实际开发中有几个明显的好处：

需要一个临时空间来存放中间计算结果或临时文件。写单元测试的时候模拟文件操作，完了自动清理。下载或解压的数据不需要长期保存。处理用户上传的文件，在保存最终结果之前需要一个缓冲区。构建自动化流程时，要确保不留下任何痕迹。

tempfile 模块基础用法

import tempfile import os # Create a temporary directorywith tempfile.TemporaryDirectory() as temp_dir: print(f"Temporary directory created at: {temp_dir}") # Create a temporary file inside the directoryfile_path = os.path.join(temp_dir, "sample.txt") with open(file_path, "w") as f: f.write("Hello, Temporary World!") # Read back the filewith open(file_path, "r") as f: print(f.read()) # At this point, the directory and its contents are deleted automaticallyprint("Temporary directory cleaned up automatically.")

关键在于 with 语句块结束时，目录和文件会自动删除，不需要手动调用 os.remove() 或 shutil.rmtree()。

手动控制临时目录的生命周期

这种方式下需要自己负责清理工作，用完记得删除。

自定义临时目录的命名和位置

输出类似这样：

Created: /tmp/myapp_abcd1234_data

当系统默认的临时目录权限不够或者空间不足时，这个功能就派上用场了。

实战案例：安全处理 ZIP 文件

整个流程结束后，解压的文件夹自动删除，磁盘不会留下任何垃圾文件。

实战案例：动态生成报告

生成的报告可以直接上传、发邮件或者读取内容，不会在本地留存。

实战案例：单元测试中的文件操作

每个测试用例都在独立的临时环境中运行，互不干扰，也不需要手动清理。

嵌套临时目录

多阶段数据处理流程中，每个阶段可以有自己的独立沙箱环境。

使用临时目录的几个注意事项

获取系统临时目录路径：

import tempfile print(tempfile.gettempdir())

tempfile.mktemp()

下面这段代码展示了如何在 PDF 处理项目中使用临时目录。整个流程包括 PDF 转图片、图片转 Markdown、最后合并成完整文档：

import os import io import shutil import tempfile from pathlib import Path from typing import Iterable, Optional, Callable, Tuple # Requires: pip install pymupdf pillow import fitz # PyMuPDF from PIL import Image

def process_pdfs_to_markdown( pdf_paths: Iterable[str | os.PathLike], output_dir: str | os.PathLike, *, page_image_dpi: int = 200, image_format: str = "PNG", llm_page_markdown_fn: Optional[Callable[[Path], str]] = None, ) -> Tuple[list[Path], list[Path]]: """ Convert each input PDF into page images using a temporary workspace, run an LLM on each page image to get Markdown, save one MD per page (still in a temp workspace), then merge the per-PDF Markdown into a single non-temporary Markdown file per PDF in `output_dir`. Non-temp file handling is kept simple (write final merged .md into `output_dir`), while the heavy lifting uses temp directories that auto-clean on success or error. Parameters ---------- pdf_paths : Iterable[str | PathLike] Paths to PDF files to process. output_dir : str | PathLike Directory where FINAL merged Markdown files (non-temp) will be written. page_image_dpi : int, optional Rendering resolution for converting PDF pages to images. Higher DPI → sharper (default 200). image_format : str, optional Image format for page renders (e.g., "PNG", "JPEG"). Default "PNG". llm_page_markdown_fn : Callable[[Path], str], optional A callable that takes a Path to a page image and returns Markdown text for that page. If not provided, a placeholder stub will be used. Returns ------- Tuple[list[Path], list[Path]] A tuple (final_markdown_files, per_page_markdown_files_flattened) - final_markdown_files: list of merged Markdown file paths written in output_dir (non-temp) - per_page_markdown_files_flattened: flattened list of all per-page MD files (in temp, ephemeral) (Returned for inspection/logging; these will be deleted when temp dir goes away.) Notes ----- - Uses a single top-level TemporaryDirectory for the whole batch to keep structure neat. - For each PDF, creates `/tmp/.../<pdf_stem>/images` and `/tmp/.../<pdf_stem>/md`. - Each page is rendered to an image file named `page-<index>.<ext>`. - Each page's Markdown is saved to `page-<index>.md`. - Finally, merges all page MDs for that PDF into `<output_dir>/<pdf_stem>.md` (non-temp). - Replace `llm_stub_markdown_from_image` with your actual LLM call (OpenAI, local VLM, etc.). Pseudocode hint for real LLM integration ---------------------------------------- def llm_page_markdown_fn(img_path: Path) -> str: # pseudo: # bytes = img_path.read_bytes() # resp = my_llm_client.vision_to_md(image=bytes, system_prompt="Extract content as Markdown.") # return resp.markdown pass """ output_dir = Path(output_dir) output_dir.mkdir(parents=True, exist_ok=True) # --- Local helper: default LLM stub (replace this with your LLM call) --- def llm_stub_markdown_from_image(img_path: Path) -> str: # This is a placeholder. Swap with a real LLM/VLM call to convert the image to Markdown. # You can pass the image bytes and ask the model to produce clean Markdown with headings, tables, lists, etc. return f"# Page extracted (stub)\n\n_Image: {img_path.name}_\n\n> Replace this with real LLM Markdown output." # Choose the LLM function (user-supplied or stub) llm_to_md = llm_page_markdown_fn or llm_stub_markdown_from_image final_markdown_files: list[Path] = [] per_page_markdown_files_flattened: list[Path] = [] # Top-level temp root for the entire run with tempfile.TemporaryDirectory(prefix="pdf2img-md_") as temp_root: temp_root = Path(temp_root) for pdf_path in map(Path, pdf_paths): if not pdf_path.exists() or pdf_path.suffix.lower() != ".pdf": # Skip invalid entries gracefully; alternatively raise ValueError continue pdf_stem = pdf_path.stem pdf_temp_dir = temp_root / pdf_stem images_dir = pdf_temp_dir / "images" md_dir = pdf_temp_dir / "md" images_dir.mkdir(parents=True, exist_ok=True) md_dir.mkdir(parents=True, exist_ok=True) # --- 1) Render pages to images in temp --- # Using PyMuPDF: fast, no external poppler dependency pages_rendered: list[Path] = [] with fitz.open(pdf_path) as doc: # scale based on DPI (PyMuPDF normally uses zoom factors; convert DPI to zoom) # base DPI ~72; zoom = target_dpi / 72 zoom = page_image_dpi / 72.0 mat = fitz.Matrix(zoom, zoom) for page_index in range(doc.page_count): page = doc.load_page(page_index) pix = page.get_pixmap(matrix=mat, alpha=False) # no alpha for standard formats img_bytes = pix.tobytes(output=image_format.lower()) img_name = f"page-{page_index + 1}.{image_format.lower()}" img_path = images_dir / img_name # Save via PIL to ensure consistent headers/metadata if needed with Image.open(io.BytesIO(img_bytes)) as im: im.save(img_path, format=image_format) pages_rendered.append(img_path) # --- 2) For each page image, call LLM to get Markdown; save per-page MD in temp --- page_md_files: list[Path] = [] for img_path in pages_rendered: md_text = llm_to_md(img_path) # <-- your real LLM call here md_path = md_dir / (img_path.stem + ".md") md_path.write_text(md_text, encoding="utf-8") page_md_files.append(md_path) per_page_markdown_files_flattened.append(md_path) # --- 3) Merge per-page MD into a FINAL non-temp Markdown file (one per PDF) --- final_md_path = output_dir / f"{pdf_stem}.md" # If you want sophisticated merging rules, implement here (e.g., front matter, TOC). # Pseudocode for richer post-processing could be: # combined = render_front_matter(pdf_path) + "\n" + concatenate_markdown(page_md_files) + "\n" + add_toc() # final_md_path.write_text(combined, encoding="utf-8") with final_md_path.open("w", encoding="utf-8") as fout: fout.write(f"<!-- Source PDF: {pdf_path.name} -->\n") fout.write(f"# {pdf_stem}\n\n") for i, md_file in enumerate(sorted(page_md_files, key=lambda p: p.name), start=1): fout.write(f"\n\n---\n\n<!-- Page {i} -->\n\n") fout.write(md_file.read_text(encoding="utf-8")) final_markdown_files.append(final_md_path) # NOTE: # All temp content (images & per-page MDs) is automatically cleaned up on exit. return final_markdown_files, per_page_markdown_files_flattened

实际使用时把 llm_stub_markdown_from_image 替换成真正的 LLM 调用（比如 OpenAI 的 Vision API 或者本地视觉模型），就能实现完整的 PDF 文档处理流程。

总结

Sravanth

顶一下() 踩一下()

打赏

tempfile 模块基础用法

手动控制临时目录的生命周期

自定义临时目录的命名和位置

实战案例：安全处理 ZIP 文件

实战案例：动态生成报告

实战案例：单元测试中的文件操作

嵌套临时目录

使用临时目录的几个注意事项

总结

热门推荐

马尔代夫离中国多远(藏在中国大陆最南端的“广东马尔代夫”)

c语言程序设计软件(C语言开发的5个最佳IDE)

鹈鹕对火箭(火箭119-110鹈鹕！无解不是2连胜，是杜兰特评价小贾，乌度卡摊牌)

ibinder(快停下！这个习惯让人反复内耗！但很多人却每天都在无意识地做)

托雷斯转会巴萨(挖矿成功？巴萨5500万欧签下费兰·托雷斯)

网络维护教程(网站运营维护及信息发布监测怎么做)

学习php(原创：带你全面了解和学习PHP)

皇家马德里vs奥萨苏纳直播(（体育）足球——西甲：皇家马德里胜奥萨苏纳)

20182019欧冠赛程赛果(欧冠8强对阵结果：尤文对阵阿贾克斯巴萨遭遇曼联)

天天体育直播官网(12月17日19点30分CCTV5-5+精彩赛事直播预告！附直播时间表)

tempfile 模块基础用法

手动控制临时目录的生命周期

自定义临时目录的命名和位置

实战案例：安全处理 ZIP 文件

实战案例：动态生成报告

实战案例：单元测试中的文件操作

嵌套临时目录

使用临时目录的几个注意事项

总结

热门推荐

马尔代夫离中国多远(藏在中国大陆最南端的“广东马尔代夫”)

c语言程序设计软件(C语言开发的5个最佳IDE)

鹈鹕对火箭(火箭119-110鹈鹕！无解不是2连胜，是杜兰特评价小贾，乌度卡摊牌)

ibinder(快停下！这个习惯让人反复内耗！但很多人却每天都在无意识地做)

托雷斯转会巴萨(挖矿成功？巴萨5500万欧签下费兰·托雷斯)

网络维护教程(网站运营维护及信息发布监测怎么做)

学习php(原创：带你全面了解和学习PHP)

皇家马德里vs奥萨苏纳直播(（体育）足球——西甲：皇家马德里胜奥萨苏纳)

20182019欧冠赛程赛果(欧冠8强对阵结果：尤文对阵阿贾克斯 巴萨遭遇曼联)

天天体育直播官网(12月17日19点30分CCTV5-5+精彩赛事直播预告！附直播时间表)

20182019欧冠赛程赛果(欧冠8强对阵结果：尤文对阵阿贾克斯巴萨遭遇曼联)