pandas read_excel()和to_excel()函数解析
作者:君子胡云 发布时间:2022-11-26 06:25:19
前言
数据分析时候,需要将数据进行加载和存储,本文主要介绍和excel的交互。
read_excel()
加载函数为read_excel(),其具体参数如下。
read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,names=None, parse_cols=None, parse_dates=False,date_parser=None,na_values=None,thousands=None, convert_float=True, has_index_names=None, converters=None,dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)
常用参数解析:
io : string, path object ; excel 路径。
sheetname : string, int, mixed list of strings/ints, or None, default 0 返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe
header : int, list of ints, default 0 指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None
skiprows : list-like,Rows to skip at the beginning,省略指定行数的数据
skip_footer : int,default 0, 省略从尾部数的int行数据
index_col : int, list of ints, default None指定列为索引列,也可以使用u”strings”
names : array-like, default None, 指定列的名字。
数据源:
sheet1:
ID NUM-1 NUM-2 NUM-3
36901 142 168 661
36902 78 521 602
36903 144 600 521
36904 95 457 468
36905 69 596 695
sheet2:
ID NUM-1 NUM-2 NUM-3
36906 190 527 691
36907 101 403 470
(1)函数原型
basestation ="F://pythonBook_PyPDAM/data/test.xls"
data = pd.read_excel(basestation)
print data
输出:是一个dataframe
ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
(2) sheetname参数:返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe
data_1 = pd.read_excel(basestation,sheetname=[0,1])
print data_1
print type(data_1)
输出:dict of dataframe
OrderedDict([(0, ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695),
(1, ID NUM-1 NUM-2 NUM-3
0 36906 190 527 691
1 36907 101 403 470)])
(3)header参数:指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None ,注意这里还有列名的一行。
data = pd.read_excel(basestation,header=None)
print data
输出:
0 1 2 3
0 ID NUM-1 NUM-2 NUM-3
1 36901 142 168 661
2 36902 78 521 602
3 36903 144 600 521
4 36904 95 457 468
5 36905 69 596 695
data = pd.read_excel(basestation,header=[3])
print data
输出:
36903 144 600 521
0 36904 95 457 468
1 36905 69 596 695
(4) skiprows 参数:省略指定行数的数据
data = pd.read_excel(basestation,skiprows = [1])
print data
输出:
ID NUM-1 NUM-2 NUM-3
0 36902 78 521 602
1 36903 144 600 521
2 36904 95 457 468
3 36905 69 596 695
(5)skip_footer参数:省略从尾部数的int行的数据
data = pd.read_excel(basestation, skip_footer=3)
print data
输出:
ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
(6)index_col参数:指定列为索引列,也可以使用u”strings”
data = pd.read_excel(basestation, index_col="NUM-3")
print data
输出:
ID NUM-1 NUM-2
NUM-3
661 36901 142 168
602 36902 78 521
521 36903 144 600
468 36904 95 457
695 36905 69 596
(7)names参数: 指定列的名字。
data = pd.read_excel(basestation,names=["a","b","c","e"])
print data
a b c e
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
具体参数如下:
>>> print help(pandas.read_excel)
Help on function read_excel in module pandas.io.excel:
read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)
Read an Excel table into a pandas DataFrame
Parameters
----------
io : string, path object (pathlib.Path or py._path.local.LocalPath),
file-like object, pandas ExcelFile, or xlrd workbook.
The string could be a URL. Valid URL schemes include http, ftp, s3,
and file. For file URLs, a host is expected. For instance, a local
file could be file://localhost/path/to/workbook.xlsx
sheetname : string, int, mixed list of strings/ints, or None, default 0
Strings are used for sheet names, Integers are used in zero-indexed
sheet positions.
Lists of strings/integers are used to request multiple sheets.
Specify None to get all sheets.
str|int -> DataFrame is returned.
list|None -> Dict of DataFrames is returned, with keys representing
sheets.
Available Cases
* Defaults to 0 -> 1st sheet as a DataFrame
* 1 -> 2nd sheet as a DataFrame
* "Sheet1" -> 1st sheet as a DataFrame
* [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
* None -> All sheets as a dictionary of DataFrames
header : int, list of ints, default 0
Row (0-indexed) to use for the column labels of the parsed
DataFrame. If a list of integers is passed those row positions will
be combined into a ``MultiIndex``
skiprows : list-like
Rows to skip at the beginning (0-indexed)
skip_footer : int, default 0
Rows at the end to skip (0-indexed)
index_col : int, list of ints, default None
Column (0-indexed) to use as the row labels of the DataFrame.
Pass None if there is no such column. If a list is passed,
those columns will be combined into a ``MultiIndex``. If a
subset of data is selected with ``parse_cols``, index_col
is based on the subset.
names : array-like, default None
List of column names to use. If file contains no header row,
then you should explicitly pass header=None
converters : dict, default None
Dict of functions for converting values in certain columns. Keys can
either be integers or column labels, values are functions that take one
input argument, the Excel cell content, and return the transformed
content.
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
Use `object` to preserve data as stored in Excel and not interpret dtype.
If converters are specified, they will be applied INSTEAD
of dtype conversion.
.. versionadded:: 0.20.0
true_values : list, default None
Values to consider as True
.. versionadded:: 0.19.0
false_values : list, default None
Values to consider as False
.. versionadded:: 0.19.0
parse_cols : int or list, default None
* If None then parse all columns,
* If int then indicates last column to be parsed
* If list of ints then indicates list of column numbers to be parsed
* If string then indicates comma separated list of Excel column letters and
column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
both sides.
squeeze : boolean, default False
If the parsed data only contains one column then return a Series
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific
per-column NA values. By default the following values are interpreted
as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',
'1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.
thousands : str, default None
Thousands separator for parsing string columns to numeric. Note that
this parameter is only necessary for columns stored as TEXT in Excel,
any numeric columns will automatically be parsed, regardless of display
format.
keep_default_na : bool, default True
If na_values are specified and keep_default_na is False the default NaN
values are overridden, otherwise they're appended to.
verbose : boolean, default False
Indicate number of NA values placed in non-numeric columns
engine: string, default None
If io is not a buffer or path, this must be set to identify io.
Acceptable values are None or xlrd
convert_float : boolean, default True
convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric
data will be read in as floats: Excel stores all numbers as floats
internally
has_index_names : boolean, default None
DEPRECATED: for version 0.17+ index names will be automatically
inferred based on index_col. To read Excel output from 0.16.2 and
prior that had saved index names, use True.
Returns
to_excel()
存储函数为pd.DataFrame.to_excel(),注意,必须是DataFrame写入excel, 即Write DataFrame to an excel sheet。其具体参数如下:
to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None,
inf_rep='inf', verbose=True, freeze_panes=None)
常用参数解析
- excel_writer : string or ExcelWriter object File path or existing ExcelWriter目标路径
- sheet_name : string, default ‘Sheet1' Name of sheet which will contain DataFrame,填充excel的第几页
- na_rep : string, default ”,Missing data representation 缺失值填充
- float_format : string, default None Format string for floating point numbers
- columns : sequence, optional,Columns to write 选择输出的的列。
- header : boolean or list of string, default True Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True,Write row names (index)
- index_label : string or sequence, default None, Column label for index column(s) if desired. If None is given, andheader and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startrow :upper left cell row to dump data frame
- startcol :upper left cell column to dump data frame
- engine : string, default None ,write engine to use - you can also set this via the options,io.excel.xlsx.writer, io.excel.xls.writer, andio.excel.xlsm.writer.
- merge_cells : boolean, default True Write MultiIndex and Hierarchical Rows as merged cells.
- encoding: string, default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.
- inf_rep : string, default ‘inf' Representation for infinity (there is no native representation for infinity in Excel)
- freeze_panes : tuple of integer (length 2), default None Specifies the one-based bottommost row and rightmost column that is to be frozen
数据源:
ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
5 36906 165 453
加载数据:
basestation ="F://python/data/test.xls"
basestation_end ="F://python/data/test_end.xls"
data = pd.read_excel(basestation)
(1)参数excel_writer,输出路径。
data.to_excel(basestation_end)
输出:
ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
5 36906 165 453
(2)sheet_name,将数据存储在excel的那个sheet页面。
data.to_excel(basestation_end,sheet_name="sheet2")
(3)na_rep,缺失值填充
data.to_excel(basestation_end,na_rep="NULL")
输出:
ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
5 36906 165 453 NULL
(4) colums参数: sequence, optional,Columns to write 选择输出的的列。
data.to_excel(basestation_end,columns=["ID"])
输出
ID
0 36901
1 36902
2 36903
3 36904
4 36905
5 36906
(5)header 参数: boolean or list of string,默认为True,可以用list命名列的名字。header = False 则不输出题头。
data.to_excel(basestation_end,header=["a","b","c","d"])
输出:
a b c d
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
5 36906 165 453
data.to_excel(basestation_end,header=False,columns=["ID"])
header = False 则不输出题头
输出:
0 36901
1 36902
2 36903
3 36904
4 36905
5 36906
(6)index : boolean, default True Write row names (index)
默认为True,显示index,当index=False 则不显示行索引(名字)。
index_label : string or sequence, default None
设置索引列的列名。
data.to_excel(basestation_end,index=False)
输出:
ID NUM-1 NUM-2 NUM-3
36901 142 168 661
36902 78 521 602
36903 144 600 521
36904 95 457 468
36905 69 596 695
36906 165 453
data.to_excel(basestation_end,index_label=["f"])
输出:
f ID NUM-1 NUM-2 NUM-3
0 36901 142 168 661
1 36902 78 521 602
2 36903 144 600 521
3 36904 95 457 468
4 36905 69 596 695
5 36906 165 453
来源:https://blog.csdn.net/tongxinzhazha/article/details/78796952


猜你喜欢
- 下面的代码是从kesion系统扒下的,确实不错,支持utf8格式。代码如下:'===========================
- 网上存在这么一个例子 obj = pd.Series([7,-5,7,4,2,0,4])obj.rank()输出为:0 6.51
- 语法:Trigger on an INSERT, UPDATE, or DELETE statement to a table or vie
- AES加密AES对称加密简介AES是一个对称密码,旨在取代DES成为广泛使用的标准。是美国联邦政府采用的一种区块加密标准。AES对称加密过程
- 一、创建一个进程实例化 Process 类创建一个进程对象然后调用它的 start 方法即可生成一个子进程from multiprocess
- WIN10系统MYSQL的下载与安装详细教程,记录如下前两天ubuntu下安装mysql遇到了一些依赖问题,结果解决了半天,没解决好,还把我
- Python来进行查询和替换一个文本字符串?可以使用sub()方法来进行查询和替换,sub方法的格式为:sub(replacement, s
- 首先选择操作系统。由于ASP属于MS(Microsoft)的东西,所以我们要选择MS的操作系统,Windows 98以上就可以(
- 提叻一个代码段,要人帮助解释一下。代码如下:declare type t_indexby is ta
- 一、效果图二、必要工具Python3.7pycharm2019再然后配置它的文件,设置游戏屏幕的大小,图片路径。代码如下''
- 1.1.1 摘要 如果说要对数据库进行优化,我们主要可以通过以下五种方法,对数据库系统进行优化。 1. 计算机硬件调优 2. 应用程序调优
- Oracle是世界上用得最多的数据库之一,活动服务器网页(ASP)是一种被广泛用于创建 * 页的功能强大的服务器端脚本语言。许多ASP开发人
- 很早前就遇到这个空值的属性,它既出现在 html 文档中,也出现在 xml 中,一直都回避,放之任之,反正也不影响文档的正确性。隐隐约约过了
- 假如公司需要统计每个员工的个人信息,制定好模板后,由员工填写,然后发送到综合部进行汇总,在这种情况下,如果公司有上百位员工的信息需要统计,且
- 1.确保系统中有依赖的libaio 软件,如果没有: yum -y
- 本文实例讲述了django框架cookie和session用法。分享给大家供大家参考,具体如下:首先知道http协议http协议它是无状态的
- 1.进入mysql/bin目录下输入mysqladmin processlist; 2.启动mysql,输入show processlist
- 下面是一份在 HTML 4 Strict 和 XHTML 1.0 Strict 下必须遵守的标签嵌套规则,比如你不能在 <a>
- 有些 MySQL 数据表中可能存在重复的记录,有些情况我们允许重复数据的存在,但有时候我们也需要删除这些重复的数据。本章节我们将为大家介绍如
- 先给大家展示效果图,感觉不错,请参考实现代码。具体代码如下所示:#!/usr/bin/env python# -*- coding:utf-