Skip to content

pandas-use-of-dot-read-table is outdated #5628

@tjkuson

Description

@tjkuson

.read_table() and .read_csv() are identical, except the first expects tab separation and the second expects comma separation. I am unclear why it is bad practice to use .read_table() to read tab separated values, for example

import pandas as pd

df = pd.read_table("data.tsv")

The current implementation of the rule would instead recommend

import pandas as pd

df = pd.read_csv("data.tsv", sep="\t")

I tried to find the reasoning for the rule following the original pandas-vet plugin, which references three sources for its rules (and, by implication, its preference for .read_csv() or .read_table()). The first is an article that suggests that .read_table() is deprecated in favour of .read_csv(), but that is no longer true following a discussion. The second is a website that cites the previous article in saying that the method is deprecated. The third is the Pandas documentation, but I couldn't find any recommendation against .read_table(). After this research, I think the rule originates from an outdated understanding of Pandas and should be changed.

Instead, I think the rule should be changed to be more useful in light of modern Pandas conventions (e.g., only recommending .read_csv() over .read_table() if the separator is set to commas).

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedReady for implementationruleImplementing or modifying a lint rule

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions