This guide will help you install GPandas and write your first data analysis program in Go.

 

Prerequisites

Before installing GPandas, ensure you have the following:

RequirementMinimum VersionCheck Command
Go1.18+go version
GitAnygit --version

 

Why Go 1.18+?

GPandas uses Go generics extensively for type-safe operations. Generics were introduced in Go 1.18, making it the minimum required version.

 

Installation

Install GPandas using the go get command:

go get github.com/apoplexi24/gpandas

 

Verify Installation

Create a simple test file to verify the installation:

package main

import (
    "fmt"
    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}
    fmt.Printf("GPandas ready: %T\n", gp)
}

Run it:

go run main.go

Expected output:

GPandas ready: gpandas.GoPandas

 

Project Setup

Here’s a recommended project structure for a GPandas-based application:

my-data-project/
    go.mod
    go.sum
    main.go
    data/
        input.csv
        output/

 

Initialize Your Project

mkdir my-data-project
cd my-data-project
go mod init my-data-project
go get github.com/apoplexi24/gpandas

 

Your First Program

Let’s create a complete example that demonstrates core GPandas functionality.

 

Step 1: Create Sample Data

Create a file data/sales.csv:

product,category,price,quantity,date
Laptop,Electronics,999.99,5,2024-01-15
Mouse,Electronics,29.99,50,2024-01-15
Desk,Furniture,299.99,10,2024-01-16
Chair,Furniture,149.99,25,2024-01-16
Monitor,Electronics,399.99,15,2024-01-17

 

Step 2: Write the Analysis Code

Create main.go:

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    // Initialize GPandas
    gp := gpandas.GoPandas{}

    // Load the CSV file
    df, err := gp.Read_csv("data/sales.csv")
    if err != nil {
        log.Fatalf("Failed to load CSV: %v", err)
    }

    // Display the full DataFrame
    fmt.Println("=== Sales Data ===")
    fmt.Println(df.String())

    // Select specific columns
    subset, err := df.Select("product", "price", "quantity")
    if err != nil {
        log.Fatalf("Failed to select columns: %v", err)
    }
    
    fmt.Println("\n=== Product Details ===")
    fmt.Println(subset.String())

    // Access data by position
    firstProduct, _ := df.ILoc().At(0, 0)
    fmt.Printf("\nFirst product: %v\n", firstProduct)

    // Access data by label
    price, _ := df.Loc().At("0", "price")
    fmt.Printf("First product price: %v\n", price)

    // Get a range of rows
    topThree, _ := df.ILoc().Range(0, 3)
    fmt.Println("\n=== Top 3 Rows ===")
    fmt.Println(topThree.String())

    // Export to a new CSV file
    _, err = df.ToCSV("data/output/sales_copy.csv", ",")
    if err != nil {
        log.Printf("Warning: Could not export CSV: %v", err)
    } else {
        fmt.Println("\nExported to data/output/sales_copy.csv")
    }
}

 

Step 3: Run the Program

mkdir -p data/output
go run main.go

 

Expected Output

=== Sales Data ===
+---------+-------------+--------+----------+------------+
| product | category    | price  | quantity | date       |
+---------+-------------+--------+----------+------------+
| Laptop  | Electronics | 999.99 | 5        | 2024-01-15 |
| Mouse   | Electronics | 29.99  | 50       | 2024-01-15 |
| Desk    | Furniture   | 299.99 | 10       | 2024-01-16 |
| Chair   | Furniture   | 149.99 | 25       | 2024-01-16 |
| Monitor | Electronics | 399.99 | 15       | 2024-01-17 |
+---------+-------------+--------+----------+------------+
[5 rows x 5 columns]

=== Product Details ===
+---------+--------+----------+
| product | price  | quantity |
+---------+--------+----------+
| Laptop  | 999.99 | 5        |
| Mouse   | 29.99  | 50       |
| Desk    | 299.99 | 10       |
| Chair   | 149.99 | 25       |
| Monitor | 399.99 | 15       |
+---------+--------+----------+
[5 rows x 3 columns]

First product: Laptop
First product price: 999.99

=== Top 3 Rows ===
+---------+-------------+--------+----------+------------+
| product | category    | price  | quantity | date       |
+---------+-------------+--------+----------+------------+
| Laptop  | Electronics | 999.99 | 5        | 2024-01-15 |
| Mouse   | Electronics | 29.99  | 50       | 2024-01-15 |
| Desk    | Furniture   | 299.99 | 10       | 2024-01-16 |
+---------+-------------+--------+----------+------------+
[3 rows x 5 columns]

Exported to data/output/sales_copy.csv

 

Common Workflow

Here’s a typical GPandas workflow visualized:

flowchart TD
    subgraph Load["1. Load Data"]
        CSV[Read CSV]
        SQL[Query SQL]
        MEM[From Memory]
    end
    
    subgraph Transform["2. Transform"]
        SEL[Select Columns]
        REN[Rename Columns]
        MRG[Merge DataFrames]
        IDX[Index Operations]
    end
    
    subgraph Analyze["3. Analyze"]
        ACC[Access Values]
        SLC[Slice Rows]
        FLT[Filter Data]
    end
    
    subgraph Export["4. Export"]
        EXP[To CSV]
        STR[To String]
    end
    
    CSV --> SEL
    SQL --> SEL
    MEM --> SEL
    SEL --> REN
    REN --> MRG
    MRG --> IDX
    IDX --> ACC
    ACC --> SLC
    SLC --> FLT
    FLT --> EXP
    FLT --> STR
    
    style Load fill:#1e293b,stroke:#22c55e,stroke-width:2px
    style Transform fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Analyze fill:#1e293b,stroke:#f59e0b,stroke-width:2px
    style Export fill:#1e293b,stroke:#8b5cf6,stroke-width:2px

 

Quick Reference

Core Functions

FunctionPurposeExample
Read_csv()Load CSV filegp.Read_csv("data.csv")
DataFrame()Create from memorygp.DataFrame(cols, data, types)
Read_sql()Query SQL databasegp.Read_sql(query, config)
From_gbq()Query BigQuerygp.From_gbq(query, projectID)

 

DataFrame Methods

MethodPurposeExample
String()Pretty printdf.String()
Select()Select columnsdf.Select("col1", "col2")
SelectCol()Get single columndf.SelectCol("col1")
Rename()Rename columnsdf.Rename(map[string]string{"old": "new"})
Merge()Join DataFramesdf.Merge(other, "key", InnerMerge)
ToCSV()Export to CSVdf.ToCSV("out.csv", ",")

 

Indexing

AccessorMethodPurpose
Loc().At(label, col)Get value by label
Loc().Row(label)Get row by label
Loc().Col(name)Get column by name
ILoc().At(row, col)Get value by position
ILoc().Row(pos)Get row by position
ILoc().Range(start, end)Get row range

 

Next Steps

Now that you have GPandas installed and running, explore these topics:

 

Troubleshooting

Common Issues

IssueSolution
go: module not foundRun go mod tidy to resolve dependencies
undefined: gpandasEnsure import path is github.com/apoplexi24/gpandas
Go version errorUpgrade to Go 1.18+
CSV not foundCheck file path is relative to where you run go run

 

Getting Help

  • Check the GitHub Issues
  • Review the source code documentation
  • Run tests: go test ./... in the gpandas directory