-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
.read_table()
and .read_csv()
are identical, except the first expects tab separation and the second expects comma separation. I am unclear why it is bad practice to use .read_table()
to read tab separated values, for example
import pandas as pd
df = pd.read_table("data.tsv")
The current implementation of the rule would instead recommend
import pandas as pd
df = pd.read_csv("data.tsv", sep="\t")
I tried to find the reasoning for the rule following the original pandas-vet plugin, which references three sources for its rules (and, by implication, its preference for .read_csv()
or .read_table()
). The first is an article that suggests that .read_table()
is deprecated in favour of .read_csv()
, but that is no longer true following a discussion. The second is a website that cites the previous article in saying that the method is deprecated. The third is the Pandas documentation, but I couldn't find any recommendation against .read_table()
. After this research, I think the rule originates from an outdated understanding of Pandas and should be changed.
Instead, I think the rule should be changed to be more useful in light of modern Pandas conventions (e.g., only recommending .read_csv()
over .read_table()
if the separator is set to commas).