Detecting File Encodings
There have been numerous times where I’ve tried read a CSV file into a Pandas DataFrame and it fails due to the file encoding. The best thing to do is to detect the file encoding by reading a few lines from the file and then passing that encoding to Pandas. The file encoding detection part can be done with the chardet
package and below is a convenience function for grabbing the encoding for the first n_lines
:
Then you can call this function with a file name:
I hope this help you too!