The Read_csv() function is one of the most commonly used functions in GPandas. It allows you to efficiently load data from CSV files into a DataFrame for analysis and manipulation.

 

Function Signature

func (GoPandas) Read_csv(filepath string) (*dataframe.DataFrame, error)

 

Parameters

ParameterTypeDescription
filepathstringThe path to the CSV file to be read

 

Returns

TypeDescription
*dataframe.DataFrameA pointer to a DataFrame containing the CSV data
errorAn error if the file cannot be read or parsed

 

Basic Usage

Here’s a simple example of loading a CSV file:

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    // Create a GPandas instance
    gp := gpandas.GoPandas{}

    // Read CSV file into DataFrame
    df, err := gp.Read_csv("data.csv")
    if err != nil {
        log.Fatalf("Error reading CSV: %v", err)
    }

    // Print the DataFrame
    fmt.Println(df.String())
}

 

Example with Sample Data

Suppose you have a CSV file named employees.csv with the following content:

name,department,salary,years
Alice,Engineering,85000,5
Bob,Marketing,72000,3
Charlie,Engineering,92000,7
Diana,Sales,68000,2

 

You can load and work with this data:

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    // Load the CSV file
    df, err := gp.Read_csv("employees.csv")
    if err != nil {
        log.Fatalf("Error reading CSV: %v", err)
    }

    // Display the DataFrame
    fmt.Println("Employee Data:")
    fmt.Println(df.String())

    // Access specific columns
    names, _ := df.SelectCol("name")
    fmt.Printf("Employee names: %v\n", names)

    // Select multiple columns
    subset, err := df.Select("name", "salary")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("Name and Salary:")
    fmt.Println(subset.String())
}

 

How It Works

The Read_csv() function uses a concurrent architecture for optimal performance:

flowchart LR
    subgraph Input
        FILE[CSV File]
    end
    
    subgraph Processing["Parallel Processing"]
        OPEN[Open File]
        HDR[Read Headers]
        
        subgraph Workers["Worker Pool"]
            W1[Worker 1]
            W2[Worker 2]
            W3[Worker N]
        end
        
        CHAN[Buffered Channel]
        COMB[Combine Results]
    end
    
    subgraph Output
        DF[DataFrame]
        SER1[Series 1]
        SER2[Series 2]
        SERN[Series N]
    end
    
    FILE --> OPEN
    OPEN --> HDR
    HDR --> CHAN
    CHAN --> W1
    CHAN --> W2
    CHAN --> W3
    W1 --> COMB
    W2 --> COMB
    W3 --> COMB
    COMB --> DF
    DF --> SER1
    DF --> SER2
    DF --> SERN
    
    style Processing fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Workers fill:#0f172a,stroke:#22c55e,stroke-width:2px

 

Processing Steps

StepDescription
1. Open FileOpens the CSV file at the specified path
2. Read HeadersExtracts column names from the first row
3. Parallel ProcessingUses runtime.NumCPU() workers for concurrent row parsing
4. Build ColumnsCreates a Series for each column with parsed data
5. Return DataFrameConstructs DataFrame with columnar storage

 

Performance Features

Read_csv() is optimized for performance with these key features:

FeatureBenefit
Concurrent ProcessingUses goroutines and buffered channels to parse rows in parallel
Efficient MemoryPre-allocates buffers to minimize memory allocations
Columnar StorageStores data in column-major format for efficient column operations
Worker PoolScales with available CPU cores

 

Error Handling

The function returns an error in the following cases:

Error ConditionDescription
File not foundThe specified file path does not exist
Permission deniedNo read access to the file
Empty fileThe CSV file has no headers
Malformed CSVThe CSV structure is invalid

 

Example with Error Handling

package main

import (
    "log"
    "os"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}
    
    df, err := gp.Read_csv("data.csv")
    if err != nil {
        switch {
        case os.IsNotExist(err):
            log.Fatal("File not found")
        case os.IsPermission(err):
            log.Fatal("Permission denied")
        default:
            log.Fatalf("Error: %v", err)
        }
    }
    
    // Continue processing...
    _ = df
}

 

Working with the Result

Once you have loaded data into a DataFrame, you can perform various operations:

 

Access Data by Position (iLoc)

// Get a single value at row 0, column 1
value, _ := df.ILoc().At(0, 1)

// Get a row by position
row, _ := df.ILoc().Row(0)

// Get a range of rows [0, 5)
rows, _ := df.ILoc().Range(0, 5)

 

Access Data by Label (Loc)

// Get value by row label and column name
value, _ := df.Loc().At("0", "name")

// Get a column by name
col, _ := df.Loc().Col("salary")

// Get multiple rows by labels
rows, _ := df.Loc().Rows([]string{"0", "2", "4"})

 

Export Back to CSV

// Export to file
_, err := df.ToCSV("output.csv", ",")

// Or get as string
csvString, _ := df.ToCSV("", ",")

 

Column Types

By default, Read_csv() reads all columns as strings. The resulting DataFrame stores data in Series with nil dtype, which accepts any type.

For type-specific operations, you may need to convert column values after loading.

 

See Also