Python如何匹配文本并在其上一行追加文本
作者:XerCis 发布时间:2022-10-19 13:16:43
标签:Python,匹配文本,追加文本
匹配文本并在其上一行追加文本
问题描述
Python匹配文本并在其上一行追加文本
test.txt
a
b
c
d
e
1.读进列表后覆盖原文件
def match_then_insert(filename, match, content):
"""匹配后在该行追加
:param filename: 要操作的文件
:param match: 匹配内容
:param content: 追加内容
"""
lines = open(filename).read().splitlines()
index = lines.index(match)
lines.insert(index, content)
open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')
效果
a
b
123
c
d
e
2.FileInput类
from fileinput import FileInput
def match_then_insert(filename, match, content):
"""匹配后在该行追加
:param filename: 要操作的文件
:param match: 匹配内容
:param content: 追加内容
"""
for line in FileInput(filename, inplace=True): # 原地过滤
if match in line:
line = content + '\n' + line
print(line, end='') # 输出重定向到原文件
match_then_insert('test.txt', match='c', content='123')
3.seek
def match_then_insert(filename, match, content):
"""匹配后在该行追加
:param filename: 要操作的文件
:param match: 匹配内容
:param content: 追加内容
"""
with open(filename, mode='rb+') as f:
while True:
try:
line = f.readline() # 逐行读取
except IndexError: # 超出范围则退出
break
line_str = line.decode().splitlines()[0]
if line_str == match:
f.seek(-len(line), 1) # 光标移动到上一行
rest = f.read() # 读取余下内容
f.seek(-len(rest), 1) # 光标移动回原位置
f.truncate() # 删除余下内容
content = content + '\n'
f.write(content.encode()) # 插入指定内容
f.write(rest) # 还原余下内容
break
match_then_insert('test.txt', match='c', content='123')
对比
方案 | 耗时/s |
---|---|
读进列表后覆盖原文件 | 54.42 |
FileInput类 | 121.59 |
seek | 3.53 |
from timeit import timeit
from fileinput import FileInput
def init_txt():
open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
lines = open(filename).read().splitlines()
index = lines.index(match)
lines.insert(index, content)
open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
for line in FileInput(filename, inplace=True):
if match in line:
line = content + '\n' + line
print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
with open(filename, mode='rb+') as f:
while True:
try:
line = f.readline()
except IndexError:
break
line_str = line.decode().splitlines()[0]
if line_str == match:
f.seek(-len(line), 1)
rest = f.read()
f.seek(-len(rest), 1)
f.truncate()
content = content + '\n'
f.write(content.encode())
f.write(rest)
break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))
遇到的坑
报错可试试在文件头部添加
# -*- coding: utf-8 -*-
或指定 encoding='utf-8'
用正则表达式匹配文本(Python经典编程案例)
ceshi.txt文本如下:第一行为空行
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-1
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ah_sina_com_cn,job: 28395818dbcb11e998a3f632d94e247c,pid: 88971,log: data/logs/chinabond_fast_spider/ah_sina_com_cn/28395818dbcb11e998a3f632d94e247c.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shupeidian_bjx_com_cn,job: 04738a5cdbcb11e9803172286b76aa73,pid: 34246,log: data/logs/chinabond_fast_spider/shupeidian_bjx_com_cn/04738a5cdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_sdchina_com,job: 28e8db4edbcb11e9803172286b76aa73,pid: 34324,log: data/logs/chinabond_fast_spider/news_sdchina_com/28e8db4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: hq_smm_cn,job: 4bdc3af6dbcb11e9a45522b8c8b2a9e4,pid: 111593,log: data/logs/chinabond_fast_spider/hq_smm_cn/4bdc3af6dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: sichuan_scol_com_cn,job: 71321c4edbcb11e9803172286b76aa73,pid: 34461,log: data/logs/chinabond_fast_spider/sichuan_scol_com_cn/71321c4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_mof_gov_cn,job: 7418dacedbcb11e9b15e02034af50b6e,pid: 65326,log: data/logs/chinabond_fast_spider/www_mof_gov_cn/7418dacedbcb11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-5
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_funxun_com,job: 4dcda7a0dbcb11e980a8862f09ca6d70,pid: 27785,log: data/logs/chinabond_fast_spider/www_funxun_com/4dcda7a0dbcb11e980a8862f09ca6d70.log,items: None
error_data:
爬虫任务报警
01:49:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shuidian_bjx_com_cn,job: 95090682dbcb11e9a0fade28e59e3773,pid: 106424,log: data/logs/chinabond_fast_spider/shuidian_bjx_com_cn/95090682dbcb11e9a0fade28e59e3773.log,items: None
error_data:
爬虫任务报警
01:51:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: tech_sina_com_cn,job: de4bdf72dbcb11e9a45522b8c8b2a9e4,pid: 111685,log: data/logs/chinabond_fast_spider/tech_sina_com_cn/de4bdf72dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ee_ofweek_com,job: ff6bd5b8dbcb11e9803172286b76aa73,pid: 34626,log: data/logs/chinabond_fast_spider/ee_ofweek_com/ff6bd5b8dbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: house_hexun_com,job: ff6dfdacdbcb11e9803172286b76aa73,pid: 34633,log: data/logs/chinabond_fast_spider/house_hexun_com/ff6dfdacdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_sjfzxm_com,job: 018e7d78dbcc11e9b15e02034af50b6e,pid: 65492,log: data/logs/chinabond_fast_spider/www_sjfzxm_com/018e7d78dbcc11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:53:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_xianzhaiwang_cn,job: 48d835e8dbcc11e9a0fade28e59e3773,pid: 106476,log: data/logs/chinabond_fast_spider/news_xianzhaiwang_cn/48d835e8dbcc11e9a0fade28e59e3773.log,items: None
error_data:
代码如下:
import os
import re
import json
from collections import namedtuple
alert = namedtuple('Spider_Alert', 'alert_time, alert_hostname, alert_project, alert_spider')
path = r'D:\data\ceshi.txt'
g_path = r'D:\data\\'
file_name = r'result.txt'
file_path = g_path + file_name
alerts_list = list()
with open(path, encoding="utf-8") as file:
lines = file.readlines() # 读取每一行
count = 0
time = None
hostname = None
project = None
for line in lines:
if re.search(r'^\d{2}:\d{2}:\d{2}\s*$', line):
time = re.search(r'^(\d{2}:\d{2}:\d{2})\s*$', line).group(1)
if re.search(r'^hostname:\s*(.+)', line):
hostname = re.search(r'^hostname:\s*(.+)', line).group(1)
if re.search(r'project:\s*([^,]+),', line):
project = re.search(r'project:\s*([^,]+),', line).group(1)
if re.search(r'spider:\s*([^,]+),', line):
spider = re.search(r'spider:\s*([^,]+),', line).group(1)
if re.search(r'^error_data', line):
spider_alert = None
spider_alert = alert(alert_time=time, alert_hostname=hostname, alert_project=project, alert_spider=spider)
alerts_list.append(spider_alert)
for element in alerts_list:
print(element[0], element[1], element[3])
with open(file_path, 'a', encoding="utf-8") as file:
file.write(element[0] + "\t" + element[1] + "\t" + element[3])
file.write(' \n')
执行结果如下图:
来源:https://xercis.blog.csdn.net/article/details/123671465


猜你喜欢
- 推荐go学习书籍,点击链接跳转京东官方商城购买。服务端经常需要返回一个列表,里面包含很多用户数据,常规做法当然是遍历然后读缓存。使用Go语言
- <% '#######以下是一个类文件,下面的注解是调用类的方法####################
- 运行net start mysql报服务名无效的解决办法,供大家参考,具体内容如下1. 症状以前电脑上安装了 MySQL,今天在电脑上运行,
- Django2.0中编写models类下的ForeignKeybook = models.ForeignKey('BookInfo&
- 树状图树状图是显示对象、组或变量之间的层次关系的图表。树状图由在节点或簇处连接的分支组成,它们代表具有相似特征的观察组。分支的高度或节点之间
- 介绍在机器视觉领域的深度学习中,每个数据集都有一份标注好的数据用于训练神经网络。为了节省空间,很多数据集的标注文件使用RLE的格式。但是神经
- Python字典设置默认值我们都知道,在 Python 的字典里边,如果 key 不存在的话,通过 key 去取值是会报错的。>>
- 这里首先要介绍官方文档,对python有了进一步深度的学习的大家们应该会发现,网上不管csdn或者简书上还是什么地方,教程来源基本就是官方文
- 你的SQL Server最近是否运行不正常?不,我指的不是我们肯定会遇到的通常的数据库和操作系统问题。我的意思是,你是否经历过服务器的反应迟
- 简单asp加载access数据库,并生成XML,然后再将XML数据加载进LIST组件范例学习。演示:http://www.taoshaw.c
- 有一个同学在Gne的群里面咨询如何通过Selenium获取当前鼠标指向的元素,在我讲了方法以后,他过了两天又来问:那么,我今天就来写一篇文章
- 本文为大家分享了Python2.7与Python3.6环境切换的具体方法,供大家参考,具体内容如下系统支持为:Ubuntu18.04系统默认
- 网页上的图片如果设置了alt属性,当鼠标移经时就会有tooltip出现,但是只能显示一行文本,有时需要多行文本,乃至图片来显示图片、链接或者
- 概要:Oracle关系数据库系统以其卓越的性能获得了广泛的应用,而保证数据库的安全性 是数据库管理工作的重要内容。本文是笔者在总结Oracl
- 前言为了避免代码泄露的风险,我们往往需要对代码进行加密,PyArmor 是一个用于加密和保护 Python 脚本的工具。它能够在运行时刻保护
- 对于初学者,入门至关重要,这关系到初学者是从入门到精通还是从入门到放弃。以下是结合Python的学习经验,整理出的一条学习路径,主要有四个阶
- 简介如果你经常网上冲浪,这样参差不齐的多栏布局,是不是很眼熟啊?类似的布局,似乎一夜之间出现在国内外大大小小的网站上,比如 Pinteres
- 本文实例讲述了js控制多图左右滚动切换效果。分享给大家供大家参考。具体如下:这是一款纯js实现点击左右按钮图片自动左右平滑滚动,默认5个一组
- 前言Vue 原本有一个官方推荐的 ajax 插件 vue-resource,但是自从 Vue 更新到 2.0 之后,官方就不再更新 vue-
- 本文实例为大家分享了vue+Element-ui实现分页效果的具体代码,供大家参考,具体内容如下当我们向后台请求大量数据的时候,并要在页面展