TypeError: string indices must be integers when updating a dataframe column using .apply

General Tech Learning Aids/Tools 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I am updating a column based on a substring in another column. This has been done by iterating through the rows.

import pandas as pd

my_DestSystemNote1_string =  'ISIN=XS1906311763|CUSIP=         |CalTyp=1'
dfDest = [('DestSystemNote1', ['ISIN=XS1906311763|CUSIP=         |CalTyp=1', 
                               'ISIN=XS0736418962|CUSIP=         |CalTyp=1', 
                               'ISIN=XS1533910508|CUSIP=         |CalTyp=1', 
                               'ISIN=US404280AS86|CUSIP=404280AS8|CalTyp=1', 
                               'ISIN=US404280BW89|CUSIP=404280BW8|CalTyp=21',
                               'ISIN=US06738EBC84|CUSIP=06738EBC8|CalTyp=21',
                               'ISIN=XS0736418962|CUSIP=         |CalTyp=1',]),
         ]
# create pandas df
dfDest = pd.DataFrame.from_items(dfDest)

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

def split_between(input_string, 
                  start_str, start_occurence, 
                  end_str, end_occurence
                 ):
    start_index = findnth(input_string, start_str, start_occurence-1) + len(start_str)
    end_index = findnth(input_string, end_str, end_occurence-1) + len(end_str) -1
    return input_string[start_index:end_index]

dfDest['FOUND_ISIN'] = ""
dfDest['FOUND_CUSIP'] = ""
dfDest.info()
for index, row in dfDest.iterrows():
    try:
        print(row.DestSystemNote1)
        row.FOUND_ISIN = split_between(row.DestSystemNote1, "ISIN=", 1, "|", 1)
        row.FOUND_CUSIP = split_between(row.DestSystemNote1, "CUSIP=", 1, "|", 2)
        # print ('DestSystemNote1=' + row.DestSystemNote1 + " " + 'FOUND_ISIN= ' + row.FOUND_ISIN)   
        # print ('DestSystemNote1=' + row.DestSystemNote1 + " " + 'FOUND_CUSIP= ' + row.FOUND_CUSIP)   
    except:
        pass # doing nothing on exception

enter image description here

To aid my learning, I would like to do the same thing but use the apply method with a lambda function i.e. update a third column FOUND_ISIN2 but I'm getting TypeError: string indices must be integers

dfDest
                                                
                                                
0 views
0 shares
profilepic.png
manpreet 2 years ago

you don't need lambda or apply. stick to pandas and you're done in three steps (probably this can be done with less than that, too):

# 1 - Create DataFrame
import pandas as pd
dfDest = pd.DataFrame.from_items(dfDest)

# 2 - String parsing
cols = ['ISIN','CUSIP', 'CalTyp'] # Define Columns
dfDest[cols] = dfDest['DestSystemNote1'].str.split('|', n=-1, expand=True) # Split Strings to columns

# 3 - Replace unwanted parts of raw data
for header in cols: # look at every column and remove its header string from the data
    dfDest[header] = dfDest[header].str.replace(header + "=", '') # and add "=" to pattern you want to remove

print dfDest

Output:

                               DestSystemNote1          ISIN      CUSIP CalTyp
0   ISIN=XS1906311763|CUSIP=         |CalTyp=1  XS1906311763                 1
1   ISIN=XS0736418962|CUSIP=         |CalTyp=1  XS0736418962                 1
2   ISIN=XS1533910508|CUSIP=         |CalTyp=1  XS1533910508                 1
3   ISIN=US404280AS86|CUSIP=404280AS8|CalTyp=1  US404280AS86  404280AS8      1
4  ISIN=US404280BW89|CUSIP=404280BW8|CalTyp=21  US404280BW89  404280BW8     21
5  ISIN=US06738EBC84|CUSIP=06738EBC8|CalTyp=21  US06738EBC84  06738EBC8     21
6   ISIN=XS0736418962|CUSIP=         |CalTyp=1  XS0736418962                 1

happy coding.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.