位置：首页>> 网络编程>> Python编程>> python Pandas之DataFrame索引及选取数据

python Pandas之DataFrame索引及选取数据

作者：xiaozheng123121　　发布时间：2023-01-01 02:27:10　

标签：python,Pandas,DataFrame,索引,选取,数据

1.索引是什么

1.1 认识索引

先创建一个简单的DataFrame。

myList = [['a', 10, 1.1],
['b', 20, 2.2],
['c', 30, 3.3],
['d', 40, 4.4]]
df1 = pd.DataFrame(data = myList)
print(df1)
--------------------------------
[out]:
0 1 2
0 a 10 1.1
1 b 20 2.2
2 c 30 3.3
3 d 40 4.4

DataFrame中有两种索引：

行索引(index)：对应最左边那一竖列
列索引(columns)：对应最上面那一横行

两种索引默认均为从0开始的自增整数。

# 输出行索引
print(df1.index)
[out]:
RangeIndex(start=0, stop=4, step=1)
---------------------------------------
# 输出列索引
print(df1.columns)
[out]:
RangeIndex(start=0, stop=3, step=1)
---------------------------------------
# 输出所有的值
print(df1.values)
[out]:
array([['a', 10, 1.1],
['b', 20, 2.2],
['c', 30, 3.3],
['d', 40, 4.4]], dtype=object)

1.2 自定义索引

可以使用 index 这个参数指定行索引，columns 这个参数指定列索引。

df2 = pd.DataFrame(myList,
index = ['one', 'two', 'three', 'four'],
columns = ['char', 'int', 'float'])
print(df2)
-----------------------------------------------------------
[out]:
char int float
one a 10 1.1
two b 20 2.2
three c 30 3.3
four d 40 4.4

输出此时的行索引和列索引：

# 输出行索引
print(df2.index)
[out]:
Index(['one', 'two', 'three', 'four'], dtype='object')
--------------------------------------------------------
# 输出列索引
print(df2.columns)
[out]:
Index(['char', 'int', 'float'], dtype='object')

2. 索引的简单使用

2.1 列索引

选择一列：

print(df2['char'])
print(df2.char)
# 两种方式输出一样
[out]:
one a
two b
three c
four d
Name: char, dtype: object

注意此时方括号里面只传入一个字符串’char’，这样选出来的一列，结果的类型为Series

print(df2['char'])
print(df2.char)
# 两种方式输出一样
[out]:
one a
two b
three c
four d
Name: char, dtype: object

选择多列：

print(df2[['char', 'int']])
[out]:
char int
one a 10
two b 20
three c 30
four d 40

注意此时方括号里面传入一个列表 [‘char’, ‘int’]，选出的结果类型为 DataFrame。
如果只想选出来一列，却想返回 DataFrame 类型怎么办？

print(df2[['char']])
[out]:
char
one a
two b
three c
four d
---------------------------------------
type(df2[['char']])
[out]：pandas.core.frame.DataFrame

注意直接使用df2[0]取某一列会报错，除非columns是由下标索引组成的，比如df1那个样子，df1[0]就不会报错。

print(df1[0])
[out]:
0 a
1 b
2 c
3 d
Name: 0, dtype: object
-----------------------
print(df2[0])
[out]:
KeyError: 0

2.2 行索引

2.2.1 使用[ ]

区别于选取列，此种方式[ ]中不再单独的传入一个字符串，而是需要使用冒号切片。

选取行标签从 ’two’ 到 ’three’ 的多行数据

print(df2['two': 'three'])
[out]:
char int float
two b 20 2.2
three c 30 3.3

选取行标签为’two’这一行数据

# 此时返回的类型为DataFrame
print(df2['two': 'two'])
[out]:
char int float
two b 20 2.2

在[ ]中不仅可以传入行标签，还可以传入行的编号。

选取从第1行到第3行的数据(编号从0开始)

print(df2[1:4])
[out]:
char int float
two b 20 2.2
three c 30 3.3
four d 40 4.4

可以看到选取的数据是不包含方括号最右侧的编号所对应的数据的。

选取第1行的数据

print(df2[1:2])
[out]:
char int float
two b 20 2.2

2.2.2 使用.loc()和.iloc()

区别就是.loc()是根据行索引和列索引的值来选取数据，而.iloc()是根据从0开始的下标位置来进行索引的。

选取行：

使用.loc()

print(df2.loc['one'])
[out]:
char a
int 10
float 1.1
Name: one, dtype: object
-------------------------------------------
print(df2.loc[['one', 'three']])
[out]:
char int float
one a 10 1.1
three c 30 3.3

使用.iloc()

print(df2.iloc[0])
[out]:
char a
int 10
float 1.1
Name: one, dtype: object
-------------------------------------------
print(df2.iloc[[0, 2]])
[out]:
char int float
one a 10 1.1
three c 30 3.3

来源：https://blog.csdn.net/weixin_46713695/article/details/125959391

投稿