I apologize if this seems simple, but I have spent an absurd amount of time chasing Google searches in circles and I can't seem to find an answer to this problem...
I have a series of excel files (of .csv type) that I need to import. They are composed of data consisting of 9 columns separated by |. Looks something like this:
1 CustID|Animal Adopted|Description|Year|Adoption date|dob|Fee|Paid
2 123|Horse|I got a Horse|2011|2011-01-31|1991-03-01|8.00|NO
3 0101|Horse|I got a Horse|2006|2006-01-31|1987-21-01|50.00|NO
4 1929|Fish|I adopted a fish and for some reason this requires a very long description |2001|2001-12-31|1991-10-01|5.00|YES
5 199|Fish|I adopted a fish and for some reason this requires a very long description |2001|2001-12-31|1991-10-01|5.00|YES
6 0101|Dog|another really long description for adopting a dog because they are awesome
7 |2003|2003-05-11|1999-04-01|50.00|(null)
8 0102|Dog|another really long description for adopting a dog because they are awesome
9 |2003|2003-05-11|1989-04-01|51.00|(null)
10 9330|Cat|Mr.Kitty is a kitty who is so fat and cute and furry|2010|2010-10-11|1986-10-20|10000.00|YES
11 1933|Hamster|Doing a dance in tube city.
12 |2011|2011-12-05|2005-06-01|4.00|(null)
Thus when imported into R it does this
CustID Animal.Adopted Description Year Adoption.date dob Fee Paid..
1 123 Horse I got a Horse 2011 2011-01-31 1991-03-01 8 NO,,
2 101 Horse I got a Horse 2006 2006-01-31 1987-21-01 50 NO,,
3 1929 Fish I adopted a fish,and for some reason,this requires a very long description 2001 2001-12-31 1991-10-01 5 YES
4 199 Fish I adopted a fish,and for some reason,this requires a very long description 2001 2001-12-31 1991-10-01 5 YES
5 101 Dog another really long description,for adopting a dog because they are awesome, NA
6 NA 2003 2003-05-11 1999-04-01 50.00 (null),, NA
7 102 Dog another really long description,for adopting a dog because they are awesome, NA
8 NA 2003 2003-05-11 1989-04-01 51.00 (null),, NA
9 9330 Cat Mr.Kitty is a kitty who is so fat,and cute,and furry 2010 2010-10-11 1986-10-20 10000 YES
10 1933 Hamster Doing a dance,in tube city., NA
11 NA 2011 2011-12-05 2005-06-01 4.00 (null),, NA
>
This severely impacts the accuracy of my results later when I try to categorize my data. Lines 5 & 6, 7 & 8, and 10 & 11 represent the problem, as half of the column data is on the first row and the other half is in the second.
I have found this question @Layla from a couple years ago regarding reading in text files line by line and am attempting to blend the accepted answer with a personalized for loop. However, I don't understand why the while loop condition is TRUE...
I was wondering if there was a way to code it so that every time r encounters the row with only three columns, it bumps up the next row to fill in the last two columns. But I would only want that to occur with perfectly matching data because I may be missing random names or locations throughout the data as I don't want to mismatch the entire data set as previously mentioned.
I don't know how to instruct the loop to look out for the missing data. I tried something like if line(ncol = 3) {Merge()} but lets be honest, that makes no sense...
I've tried readLines() ... but that is still reading everything in the exact same way so I figure this needs to be combine with other code. And all of my Google searches just turn up the same basic "readLines reads the connection in one line at a time" and then that's it.
I feel like I have the pieces but I still can't figure out how to put them together.
As a caveat, I am quite new to this. Even a helpful article that explains how to write a routine that reads in code manually would be helpful. There is so much information on how to read in files using read.csv, read.tsv, read.xlsx, etc. but very very little on how to handle reading in data manually.