1. is_unique
As its name sugests, this method checks if all the values of a Series are unique:
import pandas as pd print(pd.Series([1, 2, 3, 4]).is_unique) print(pd.Series([1, 2, 3, 1]).is_unique) Output: True False
2 & 3. is_monotonic and is_monotonic_decreasing
With these 2 methods, we can check if the values of a Series are in ascending/descending order:
print(pd.Series([1, 2, 3, 8]).is_monotonic) print(pd.Series([1, 2, 3, 1]).is_monotonic) print(pd.Series([9, 8, 4, 0]).is_monotonic_decreasing) Output: True False True
Both methods work also for a Series with string values. In this case, Python uses a lexicographical ordering under the hood, comparing two subsequent strings character by character. It’s not the same as just an alphabetical ordering, and actually, the example with the numeric data above is a particular case of such an ordering. As the Python documentation says,
Lexicographical ordering for strings uses the Unicode code point number to order individual characters.
In practice, it mainly means that the letter case and special symbols are also taken into account:
print(pd.Series(['fox', 'koala', 'panda']).is_monotonic) print(pd.Series(['FOX', 'Fox', 'fox']).is_monotonic) print(pd.Series(['*', '&', '_']).is_monotonic) Output: True True False
A curious exception happens when all the values of a Series are the same. In this case, both methods return True:
print(pd.Series([1, 1, 1, 1, 1]).is_monotonic) print(pd.Series(['fish', 'fish']).is_monotonic_decreasing) Output: True True
4. hasnans
This method checks if a Series contains NaN values:
import numpy as np print(pd.Series([1, 2, 3, np.nan]).hasnans) print(pd.Series([1, 2, 3, 10, 20]).hasnans) Output: True False
5. empty
Sometimes, we might want to know if a Series is completely empty, not containing even NaN values:
print(pd.Series().empty) print(pd.Series(np.nan).empty) Output: True False
A Series can become empty after some manipulations with it, for example, filtering:
s = pd.Series([1, 2, 3]) s[s > 3].empty Output: True
6 & 7. first_valid_index() and last_valid_index()
These 2 methods return index for first/last non-NaN value and are particularly useful for Series objects with many NaNs:
print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).first_valid_index()) print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).last_valid_index()) Output: 2 4
If all the values of a Series are NaN, both methods return None:
print(pd.Series([np.nan, np.nan, np.nan]).first_valid_index()) print(pd.Series([np.nan, np.nan, np.nan]).last_valid_index()) Output: None None
8. truncate()
This method allows truncating a Series before and after some index value. Let’s truncate the Series from the previous section leaving only non-NaN values:
s = pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]) s.truncate(before=2, after=4) Output: 2 1.0 3 2.0 4 3.0 dtype: float64
The original index of the Series was preserved. We may want to reset it and also to assign the truncated Series to a variable:
s_truncated = s.truncate(before=2, after=4).reset_index(drop=True) print(s_truncated) Output: 0 1.0 1 2.0 2 3.0 dtype: float64
9. convert_dtypes()
As the pandas documentation says, this method is used to
Convert columns to best possible dtypes using dtypes supporting pd.NA.
If to consider only Series objects and not DataFrames, the only application of this method is to convert all nullable integers (i.e. float numbers with a decimal part equal to 0, such as 1.0, 2.0, etc.) back to “normal” integers. Such float numbers appear when the original Series contains both integers and NaN values. Since NaN is a float in numpy and pandas, it leads to the whole Series with any missing values to become of float type as well.
Let’s take a look at the example from the previous section to see how it works:
print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan])) print('\n') print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).convert_dtypes()) Output: 0 NaN 1 NaN 2 1.0 3 2.0 4 3.0 5 NaN dtype: float64 0 <NA> 1 <NA> 2 1 3 2 4 3 5 <NA> dtype: Int64
10. clip()
We can clip all the values of a Series at input thresholds (lower and upper parameters):
s = pd.Series(range(1, 11)) print(s) s_clipped = s.clip(lower=2, upper=7) print(s_clipped) Output: 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 dtype: int64 0 2 1 2 2 3 3 4 4 5 5 6 6 7 7 7 8 7 9 7 dtype: int64
11. rename_axis()
In the case of a Series object, this method sets the name of the index:
s = pd.Series({'flour': '300 g', 'butter': '150 g', 'sugar': '100 g'}) print(s) s=s.rename_axis('ingredients') print(s) Output: flour 300 g butter 150 g sugar 100 g dtype: object ingredients flour 300 g butter 150 g sugar 100 g dtype: object
12 & 13. nsmallest() and nlargest()
These 2 methods return the smallest/largest elements of a Series. By default, they return 5 values, in ascending order for nsmallest() and in descending - for nlargest().
s = pd.Series([3, 2, 1, 100, 200, 300, 4, 5, 6]) s.nsmallest() Output: 2 1 1 2 0 3 6 4 7 5 dtype: int64
It’s possible to specify another number of the smallest/largest values to be returned. Also, we may want to reset the index and assign the result to a variable:
largest_3 = s.nlargest(3).reset_index(drop=True) print(largest_3) Output: 0 300 1 200 2 100 dtype: int64
14. pct_change()
For a Series object, we can calculate percentage change (or, more precisely, fraction change) between the current and a prior element. This approach can be helpful, for example, when working with time series, or for creating a waterfall chart in % or fractions.
s = pd.Series([20, 33, 14, 97, 19]) s.pct_change() Output: 0 NaN 1 0.650000 2 -0.575758 3 5.928571 4 -0.804124 dtype: float64
To make the resulting Series more readable, let’s round it:
s.pct_change().round(2) Output: 0 NaN 1 0.65 2 -0.58 3 5.93 4 -0.80 dtype: float64
15. explode()
This method transforms each list-like element of a Series (lists, tuples, sets, Series, ndarrays) to a row. Empty list-likes will be transformed in a row with NaN. To avoid repeated indices in the resulting Series, it’s better to reset index:
s = pd.Series([[np.nan], {1, 2}, 3, (4, 5)]) print(s) s_exploded = s.explode().reset_index(drop=True) print(s_exploded) Output: 0 [nan] 1 {1, 2} 2 3 3 (4, 5) dtype: object 0 NaN 1 1 2 2 3 3 4 4 5 5 dtype: object
16. repeat()
This method is used for consecutive repeating each element of a Series a defined number of times. Also in this case, it makes sense to reset index:
s = pd.Series([1, 2, 3]) print(s) s_repeated = s.repeat(2).reset_index(drop=True) print(s_repeated) Output: 0 1 1 2 2 3 dtype: int64 0 1 1 1 2 2 3 2 4 3 5 3 dtype: int64
If the number of repetitions is assigned to 0, an empty Series will be returned:
s.repeat(0) Output: Series([], dtype: int64)
Conclusion
To sum up, we investigated 16 rarely used pandas methods for working with Series and some of their application cases. If you know some other interesting ways to manipulate pandas Series, you’re very welcome to share them in the comments.