SQL Integration

GPandas provides seamless integration with SQL databases and Google BigQuery, allowing you to query data directly into DataFrames.

Overview

GPandas supports two main data sources for SQL integration:

Function	Data Source	Use Case
`Read_sql()`	SQL Databases	SQL Server, PostgreSQL, and other database/sql compatible databases
`From_gbq()`	Google BigQuery	Large-scale analytics on BigQuery tables

Architecture

flowchart LR
    subgraph Databases["Data Sources"]
        MSSQL[(SQL Server)]
        PG[(PostgreSQL)]
        GBQ[(BigQuery)]
    end
    
    subgraph GPandas["GPandas"]
        CONFIG[DbConfig]
        RSQL[Read_sql]
        FGBQ[From_gbq]
    end
    
    subgraph Output["Output"]
        DF[DataFrame]
        SER[Series Columns]
    end
    
    MSSQL --> CONFIG
    PG --> CONFIG
    CONFIG --> RSQL
    GBQ --> FGBQ
    RSQL --> DF
    FGBQ --> DF
    DF --> SER
    
    style Databases fill:#1e293b,stroke:#f59e0b,stroke-width:2px
    style GPandas fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Output fill:#1e293b,stroke:#22c55e,stroke-width:2px

Read_sql() - SQL Databases

The Read_sql() function executes a SQL query against a relational database and returns results as a DataFrame.

Function Signature

func (GoPandas) Read_sql(query string, db_config DbConfig) (*dataframe.DataFrame, error)

DbConfig Structure

Configure your database connection using the DbConfig struct:

type DbConfig struct {
    Database_server string  // Database type: "sqlserver" or "postgres"
    Server          string  // Hostname or IP address
    Port            string  // Port number
    Database        string  // Database name
    Username        string  // Database user
    Password        string  // Database password
}

Parameters

Parameter	Type	Description
`query`	`string`	SQL query to execute
`db_config`	`DbConfig`	Database connection configuration

Returns

Type	Description
`*dataframe.DataFrame`	Query results as a DataFrame
`error`	Error if connection or query fails

SQL Server Example

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    // Configure connection (prefer environment variables)
    config := gpandas.DbConfig{
        Database_server: "sqlserver",
        Server:          os.Getenv("DB_SERVER"),
        Port:            "1433",
        Database:        os.Getenv("DB_NAME"),
        Username:        os.Getenv("DB_USER"),
        Password:        os.Getenv("DB_PASSWORD"),
    }

    // Execute query
    query := `
        SELECT 
            employee_id,
            first_name,
            last_name,
            department,
            salary
        FROM employees
        WHERE department = 'Engineering'
        ORDER BY salary DESC
    `

    df, err := gp.Read_sql(query, config)
    if err != nil {
        log.Fatalf("Query failed: %v", err)
    }

    fmt.Println("Engineering Team:")
    fmt.Println(df.String())
}

PostgreSQL Example

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    config := gpandas.DbConfig{
        Database_server: "postgres",  // Use "postgres" for PostgreSQL
        Server:          os.Getenv("PG_HOST"),
        Port:            "5432",
        Database:        os.Getenv("PG_DATABASE"),
        Username:        os.Getenv("PG_USER"),
        Password:        os.Getenv("PG_PASSWORD"),
    }

    query := `
        SELECT 
            product_name,
            category,
            price,
            stock_quantity
        FROM products
        WHERE price > 50.00
        LIMIT 100
    `

    df, err := gp.Read_sql(query, config)
    if err != nil {
        log.Fatalf("Query failed: %v", err)
    }

    fmt.Println(df.String())
}

Connection String Formats

GPandas builds connection strings based on the Database_server value:

Database	Format
SQL Server	`server=X;user id=X;password=X;port=X;database=X`
PostgreSQL	`host=X port=X user=X password=X dbname=X sslmode=disable`

From_gbq() - Google BigQuery

The From_gbq() function queries Google BigQuery and returns results as a DataFrame.

Function Signature

func (GoPandas) From_gbq(query string, projectID string) (*dataframe.DataFrame, error)

Parameters

Parameter	Type	Description
`query`	`string`	BigQuery SQL query
`projectID`	`string`	Google Cloud Project ID

Returns

Type	Description
`*dataframe.DataFrame`	Query results as a DataFrame
`error`	Error if query execution fails

Prerequisites

Before using From_gbq(), ensure you have:

Google Cloud credentials configured
BigQuery API enabled in your project
Appropriate IAM permissions (BigQuery Data Viewer, BigQuery Job User)

Setting Up Credentials

Set the GOOGLE_APPLICATION_CREDENTIALS environment variable:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

Or use Application Default Credentials:

gcloud auth application-default login

BigQuery Example

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    projectID := "my-gcp-project"
    
    query := `
        SELECT 
            date,
            product_category,
            SUM(revenue) as total_revenue,
            COUNT(*) as transaction_count
        FROM ` + "`my-gcp-project.sales.transactions`" + `
        WHERE date >= '2024-01-01'
        GROUP BY date, product_category
        ORDER BY date DESC, total_revenue DESC
        LIMIT 1000
    `

    df, err := gp.From_gbq(query, projectID)
    if err != nil {
        log.Fatalf("BigQuery failed: %v", err)
    }

    fmt.Println("Sales Summary:")
    fmt.Println(df.String())

    // Export results locally
    _, err = df.ToCSV("sales_summary.csv", ",")
    if err != nil {
        log.Printf("Export warning: %v", err)
    }
}

BigQuery Public Datasets Example

Query public datasets without requiring your own data:

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    projectID := "my-gcp-project"
    
    // Query a public dataset
    query := `
        SELECT 
            name,
            number,
            year
        FROM ` + "`bigquery-public-data.usa_names.usa_1910_current`" + `
        WHERE year = 2020
        ORDER BY number DESC
        LIMIT 10
    `

    df, err := gp.From_gbq(query, projectID)
    if err != nil {
        log.Fatalf("Query failed: %v", err)
    }

    fmt.Println("Top 10 Baby Names in 2020:")
    fmt.Println(df.String())
}

Data Flow

Both functions follow a similar pattern for converting query results to DataFrames:

sequenceDiagram
    participant App as Application
    participant GP as GPandas
    participant DB as Database
    participant DF as DataFrame

    App->>GP: Call Read_sql() or From_gbq()
    GP->>DB: Execute Query
    DB-->>GP: Return Result Set
    
    loop For Each Column
        GP->>GP: Extract Column Values
        GP->>GP: Create Series
    end
    
    GP->>DF: Build DataFrame
    DF-->>App: Return DataFrame

Type Mapping

Query result types are mapped to Go types:

SQL Type	BigQuery Type	Go Type
INT, INTEGER	INT64	`int64`
FLOAT, REAL	FLOAT64	`float64`
VARCHAR, TEXT	STRING	`string`
BIT, BOOLEAN	BOOL	`bool`
DATE, DATETIME	DATE, TIMESTAMP	Varies
NULL	NULL	`nil`

Error Handling

SQL Database Errors

df, err := gp.Read_sql(query, config)
if err != nil {
    // Common error patterns
    switch {
    case strings.Contains(err.Error(), "connection"):
        log.Fatal("Database connection failed - check server/credentials")
    case strings.Contains(err.Error(), "query execution"):
        log.Fatal("Query syntax error or permission issue")
    case strings.Contains(err.Error(), "scanning"):
        log.Fatal("Data type conversion error")
    default:
        log.Fatalf("Database error: %v", err)
    }
}

BigQuery Errors

df, err := gp.From_gbq(query, projectID)
if err != nil {
    switch {
    case strings.Contains(err.Error(), "NewClient"):
        log.Fatal("BigQuery client creation failed - check credentials")
    case strings.Contains(err.Error(), "query.Read"):
        log.Fatal("Query execution failed - check syntax and permissions")
    case strings.Contains(err.Error(), "no rows"):
        log.Fatal("Query returned no results")
    default:
        log.Fatalf("BigQuery error: %v", err)
    }
}

Best Practices

Practice	Description
Use environment variables	Never hardcode credentials in source code
Limit result sets	Use `LIMIT` clause to avoid memory issues
Use parameterized queries	Prevent SQL injection (handle at application level)
Index key columns	Ensure database tables are properly indexed
Close connections	GPandas handles this automatically with `defer`

Security Considerations

Credential Management

// Good: Use environment variables
config := gpandas.DbConfig{
    Database_server: "sqlserver",
    Server:          os.Getenv("DB_SERVER"),
    Port:            os.Getenv("DB_PORT"),
    Database:        os.Getenv("DB_NAME"),
    Username:        os.Getenv("DB_USER"),
    Password:        os.Getenv("DB_PASSWORD"),
}

// Bad: Hardcoded credentials
config := gpandas.DbConfig{
    Password: "my-secret-password",  // Never do this!
}

Query Safety

// Be careful with dynamic queries
// Always validate user input before including in queries
func queryByDepartment(gp gpandas.GoPandas, config gpandas.DbConfig, dept string) (*dataframe.DataFrame, error) {
    // Validate input
    validDepts := map[string]bool{
        "Engineering": true,
        "Sales":       true,
        "Marketing":   true,
    }
    
    if !validDepts[dept] {
        return nil, fmt.Errorf("invalid department: %s", dept)
    }
    
    query := fmt.Sprintf(`
        SELECT * FROM employees 
        WHERE department = '%s'
    `, dept)
    
    return gp.Read_sql(query, config)
}

Complete Example: ETL Pipeline

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/apoplexi24/gpandas"
)

func main() {
    gp := gpandas.GoPandas{}

    // Step 1: Extract from SQL Server
    sqlConfig := gpandas.DbConfig{
        Database_server: "sqlserver",
        Server:          os.Getenv("SQL_SERVER"),
        Port:            "1433",
        Database:        os.Getenv("SQL_DATABASE"),
        Username:        os.Getenv("SQL_USER"),
        Password:        os.Getenv("SQL_PASSWORD"),
    }

    ordersDF, err := gp.Read_sql(`
        SELECT order_id, customer_id, product_id, quantity, order_date
        FROM orders
        WHERE order_date >= '2024-01-01'
    `, sqlConfig)
    if err != nil {
        log.Fatalf("Failed to fetch orders: %v", err)
    }
    fmt.Println("Orders loaded:", ordersDF.String())

    // Step 2: Extract from BigQuery
    analyticsDF, err := gp.From_gbq(`
        SELECT product_id, category, avg_rating
        FROM analytics.product_metrics
        WHERE avg_rating >= 4.0
    `, "my-project")
    if err != nil {
        log.Fatalf("Failed to fetch analytics: %v", err)
    }
    fmt.Println("Analytics loaded:", analyticsDF.String())

    // Step 3: Export combined data
    _, err = ordersDF.ToCSV("exports/orders_2024.csv", ",")
    if err != nil {
        log.Printf("Export warning: %v", err)
    }

    fmt.Println("ETL pipeline complete!")
}

SQL Integration

Overview link

Architecture link

Read_sql() - SQL Databases link

Function Signature link

DbConfig Structure link

Parameters link

Returns link

SQL Server Example link

PostgreSQL Example link

Connection String Formats link

From_gbq() - Google BigQuery link

Function Signature link

Parameters link

Returns link

Prerequisites link

Setting Up Credentials link

BigQuery Example link

BigQuery Public Datasets Example link

Data Flow link

Type Mapping link

Error Handling link

SQL Database Errors link

BigQuery Errors link

Best Practices link

Security Considerations link

Credential Management link

Query Safety link

Complete Example: ETL Pipeline link

See Also link

Overview

Architecture

Read_sql() - SQL Databases

Function Signature

DbConfig Structure

Parameters

Returns

SQL Server Example

PostgreSQL Example

Connection String Formats

From_gbq() - Google BigQuery

Function Signature

Parameters

Returns

Prerequisites

Setting Up Credentials

BigQuery Example

BigQuery Public Datasets Example

Data Flow

Type Mapping

Error Handling

SQL Database Errors

BigQuery Errors

Best Practices

Security Considerations

Credential Management

Query Safety

Complete Example: ETL Pipeline

See Also