On this page

Loading CSV Files

The Read_csv() function is one of the most commonly used functions in GPandas. It allows you to efficiently load data from CSV files into a DataFrame for analysis and manipulation.

Function Signature

func (GoPandas) Read_csv(filepath string) (*dataframe.DataFrame, error)

Parameters

Parameter	Type	Description
`filepath`	`string`	The path to the CSV file to be read

Returns

Type	Description
`*dataframe.DataFrame`	A pointer to a DataFrame containing the CSV data
`error`	An error if the file cannot be read or parsed

Basic Usage

Here’s a simple example of loading a CSV file:

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    // Create a GPandas instance
    gp := gpandas.GoPandas{}

    // Read CSV file into DataFrame
    df, err := gp.Read_csv("data.csv")
    if err != nil {
        log.Fatalf("Error reading CSV: %v", err)
    }

    // Print the DataFrame
    fmt.Println(df.String())
}

Example with Sample Data

Suppose you have a CSV file named employees.csv with the following content:

name,department,salary,years
Alice,Engineering,85000,5
Bob,Marketing,72000,3
Charlie,Engineering,92000,7
Diana,Sales,68000,2

You can load and work with this data:

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    // Load the CSV file
    df, err := gp.Read_csv("employees.csv")
    if err != nil {
        log.Fatalf("Error reading CSV: %v", err)
    }

    // Display the DataFrame
    fmt.Println("Employee Data:")
    fmt.Println(df.String())

    // Access specific columns
    names, _ := df.SelectCol("name")
    fmt.Printf("Employee names: %v\n", names)

    // Select multiple columns
    subset, err := df.Select("name", "salary")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("Name and Salary:")
    fmt.Println(subset.String())
}

How It Works

The Read_csv() function uses a concurrent architecture for optimal performance:

flowchart LR
    subgraph Input
        FILE[CSV File]
    end
    
    subgraph Processing["Parallel Processing"]
        OPEN[Open File]
        HDR[Read Headers]
        
        subgraph Workers["Worker Pool"]
            W1[Worker 1]
            W2[Worker 2]
            W3[Worker N]
        end
        
        CHAN[Buffered Channel]
        COMB[Combine Results]
    end
    
    subgraph Output
        DF[DataFrame]
        SER1[Series 1]
        SER2[Series 2]
        SERN[Series N]
    end
    
    FILE --> OPEN
    OPEN --> HDR
    HDR --> CHAN
    CHAN --> W1
    CHAN --> W2
    CHAN --> W3
    W1 --> COMB
    W2 --> COMB
    W3 --> COMB
    COMB --> DF
    DF --> SER1
    DF --> SER2
    DF --> SERN
    
    style Processing fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Workers fill:#0f172a,stroke:#22c55e,stroke-width:2px

Processing Steps

Step	Description
1. Open File	Opens the CSV file at the specified path
2. Read Headers	Extracts column names from the first row
3. Parallel Processing	Uses `runtime.NumCPU()` workers for concurrent row parsing
4. Build Columns	Creates a Series for each column with parsed data
5. Return DataFrame	Constructs DataFrame with columnar storage

Performance Features

Read_csv() is optimized for performance with these key features:

Feature	Benefit
Concurrent Processing	Uses goroutines and buffered channels to parse rows in parallel
Efficient Memory	Pre-allocates buffers to minimize memory allocations
Columnar Storage	Stores data in column-major format for efficient column operations
Worker Pool	Scales with available CPU cores

Error Handling

The function returns an error in the following cases:

Error Condition	Description
File not found	The specified file path does not exist
Permission denied	No read access to the file
Empty file	The CSV file has no headers
Malformed CSV	The CSV structure is invalid

Example with Error Handling

package main

import (
    "log"
    "os"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}
    
    df, err := gp.Read_csv("data.csv")
    if err != nil {
        switch {
        case os.IsNotExist(err):
            log.Fatal("File not found")
        case os.IsPermission(err):
            log.Fatal("Permission denied")
        default:
            log.Fatalf("Error: %v", err)
        }
    }
    
    // Continue processing...
    _ = df
}

Working with the Result

Once you have loaded data into a DataFrame, you can perform various operations:

Access Data by Position (iLoc)

// Get a single value at row 0, column 1
value, _ := df.ILoc().At(0, 1)

// Get a row by position
row, _ := df.ILoc().Row(0)

// Get a range of rows [0, 5)
rows, _ := df.ILoc().Range(0, 5)

Access Data by Label (Loc)

// Get value by row label and column name
value, _ := df.Loc().At("0", "name")

// Get a column by name
col, _ := df.Loc().Col("salary")

// Get multiple rows by labels
rows, _ := df.Loc().Rows([]string{"0", "2", "4"})

Export Back to CSV

// Export to file
_, err := df.ToCSV("output.csv", ",")

// Or get as string
csvString, _ := df.ToCSV("", ",")

Column Types

By default, Read_csv() reads all columns as strings. The resulting DataFrame stores data in Series with nil dtype, which accepts any type.

For type-specific operations, you may need to convert column values after loading.

Loading CSV Files

Function Signature link

Parameters link

Returns link

Basic Usage link

Example with Sample Data link

How It Works link

Processing Steps link

Performance Features link

Error Handling link

Example with Error Handling link

Working with the Result link

Access Data by Position (iLoc) link

Access Data by Label (Loc) link

Export Back to CSV link

Column Types link

See Also link