Pandas is a Python library for data manipulation and analysis. It is widely used in the field of data analytics and data science. One of the core components of Pandas is the data series.
A Pandas data series is a one-dimensional labeled array that can hold any data type such as integers, floats, strings, and even Python objects. It is similar to a column in a spreadsheet or a database table. Each element of a series is assigned a unique label called an index.
Creating a Pandas Data Series
To create a Pandas data series, we first need to import the Pandas library.
import pandas
We can then create a series by passing a list of values to the pandas.Series() function. The index of the series will be automatically generated (starting from 0), if not provided.
test_series = pd.Series([1234567890, 2345678901, 3456789012, 4567890123])
print(test_series)
Output:
0 1234567890
1 2345678901
2 3456789012
3 4567890123
dtype: int64
Or, one can provide index series as below:
test_series = pd.Series([1234567890, 2345678901, 3456789012, 4567890123], index=['John', 'Lucy', 'Mike', 'Harry'])
print(test_series)
Output:
John 1234567890
Lucy 2345678901
Mike 3456789012
Harry 4567890123
dtype: int64
Accessing Elements of a Pandas Data Series
We can access the elements of a Pandas data series using the index. We can use the loc[] function to access elements by their index label as below:
print(test_series.loc['John'])
Output: 1234567890
Or, iloc[] function to access elements by their integer index as below:
print(test_series.iloc[0])
Output: 1234567890
Operations on a Pandas Data Series
Pandas data series support various operations such as arithmetic and statistical operations.
Arithmetic Operations
We can perform arithmetic operations on a Pandas data series. When we perform an operation between two series, Pandas aligns the series based on their index labels and then performs the operation.
test_series_1 = pd.Series([10, 20, 30, 40, 50], index=['A', 'B', 'C', 'D', 'E'])
test_series_2 = pd.Series([20, 30, 40, 50, 60], index=['A', 'B', 'C', 'D', 'E'])
print(test_series_1 + test_series_2)
Output:
A 30
B 50
C 70
D 90
E 110
dtype: int64
Statistical Operations
Pandas data series support various statistical operations such as mean, median, mode, and standard deviation. We can use Pandas methods to perform these operations.
print(test_series_1.mean())
print(test_series_1.median())
print(test_series_1.mode())
print(test_series_1.std())
Conditional Indexing
We can use conditional indexing to select elements from a series that satisfies a given condition. We can use the loc[] function to select elements based on the index label and the boolean condition.
print(test_series_1[test_series_1 > 25])
print(test_series_1)
Output:
C 30
D 40
E 50
dtype: int64
Differences Between Pandas Data Series and Python Lists
A Pandas data series can hold homogeneous data, i.e., data of a single data type such as integers, floats, or strings. In contrast, Python lists can hold heterogeneous data, i.e., data of multiple data types.
In a Pandas data series, each element is labeled with a unique index. These labels can be used to access specific elements of the series, and the labels themselves can have a data type of their own. In contrast, a Python list does not have labeled elements, and elements can only be accessed using their position (index) in the list.
Pandas data series are optimized for data analysis and are built on top of NumPy arrays, which are faster and more memory-efficient than Python lists. In contrast, Python lists are more general-purpose and can be used for a wide range of tasks, but they may not be as efficient for data analysis.
Pandas data series comes with built-in functions for data analysis and manipulation, such as arithmetic and statistical operations, merging and joining datasets, and data filtering. Python lists, on the other hand, do not have built-in functions for these tasks and require more coding to achieve the same results.
Comments