Sorting Data

Learn how to sort DataFrames by column values or index labels in GPandas, with support for multi-column sorting, null handling, and in-place modifications.

Overview

GPandas provides two sorting methods:

Operation	Method	Description
Sort by Values	`SortValues()`	Sort by one or more column values
Sort by Index	`SortIndex()`	Sort by index labels

SortValues

Sorts the DataFrame by the values in one or more columns, similar to pandas’ df.sort_values().

Function Signature

func (df *DataFrame) SortValues(opts SortOptions) (*DataFrame, error)

SortOptions

Field	Type	Description	Default
`By`	`[]string`	Column names to sort by	Required
`Ascending`	`[]bool`	Sort order for each column	All `true`
`NaPosition`	`NaPosition`	Where to place null values	`NaLast`
`Inplace`	`bool`	Modify DataFrame in place	`false`
`IgnoreIndex`	`bool`	Reset index after sorting	`false`

NaPosition Constants

Constant	Description
`NaLast`	Place null values at the end (default)
`NaFirst`	Place null values at the beginning

Supported Types

Type	Comparison
`float64`	Numeric comparison
`int64`	Numeric comparison
`string`	Lexicographic comparison
`bool`	`false < true`

Sample Data

All examples use this employee DataFrame:

Employees DataFrame

Name	Department	Age	Salary
Alice	Engineering	30	95000
Bob	Sales	25	55000
Charlie	Engineering	35	105000
Diana	Sales	28	62000
Eve	Marketing	32	72000
Frank	Engineering	27	88000

Setup Code

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
    "github.com/apoplexi24/gpandas/dataframe"
)

func main() {
    gp := gpandas.GoPandas{}
    
    // Create employee DataFrame
    df, _ := gp.DataFrame(
        []string{"Name", "Department", "Age", "Salary"},
        []gpandas.Column{
            {"Alice", "Bob", "Charlie", "Diana", "Eve", "Frank"},
            {"Engineering", "Sales", "Engineering", "Sales", "Marketing", "Engineering"},
            {int64(30), int64(25), int64(35), int64(28), int64(32), int64(27)},
            {95000.0, 55000.0, 105000.0, 62000.0, 72000.0, 88000.0},
        },
        map[string]any{
            "Name":       gpandas.StringCol{},
            "Department": gpandas.StringCol{},
            "Age":        gpandas.IntCol{},
            "Salary":     gpandas.FloatCol{},
        },
    )
    
    // Examples follow...
}

Single Column Sort (Ascending)

Sort by a single column in ascending order:

flowchart LR
    subgraph Original["Original DataFrame"]
        O1[Salary: 95000]
        O2[Salary: 55000]
        O3[Salary: 105000]
        O4[Salary: 62000]
    end
    
    subgraph Sort["SortValues Operation"]
        OP[Sort by Salary<br/>Ascending]
    end
    
    subgraph Result["Sorted DataFrame"]
        R1[Salary: 55000]
        R2[Salary: 62000]
        R3[Salary: 88000]
        R4[Salary: 95000]
    end
    
    O1 --> OP
    O2 --> OP
    O3 --> OP
    O4 --> OP
    OP --> R1
    OP --> R2
    OP --> R3
    OP --> R4
    
    style Original fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Sort fill:#1e293b,stroke:#f59e0b,stroke-width:2px
    style Result fill:#1e293b,stroke:#22c55e,stroke-width:2px

Example

sorted, err := df.SortValues(dataframe.SortOptions{
    By: []string{"Salary"},
})
if err != nil {
    log.Fatalf("Sort failed: %v", err)
}
fmt.Println(sorted.String())

Output

+------+-------------+-----+--------+
| Name | Department  | Age | Salary |
+------+-------------+-----+--------+
| Bob  | Sales       | 25  | 55000  |
| Diana| Sales       | 28  | 62000  |
| Eve  | Marketing   | 32  | 72000  |
| Frank| Engineering | 27  | 88000  |
| Alice| Engineering | 30  | 95000  |
| Charlie | Engineering | 35 | 105000 |
+------+-------------+-----+--------+
[6 rows x 4 columns]

Single Column Sort (Descending)

Sort by a single column in descending order:

sorted, err := df.SortValues(dataframe.SortOptions{
    By:        []string{"Age"},
    Ascending: []bool{false},
})
if err != nil {
    log.Fatalf("Sort failed: %v", err)
}
fmt.Println(sorted.String())

Output

+---------+-------------+-----+--------+
| Name    | Department  | Age | Salary |
+---------+-------------+-----+--------+
| Charlie | Engineering | 35  | 105000 |
| Eve     | Marketing   | 32  | 72000  |
| Alice   | Engineering | 30  | 95000  |
| Diana   | Sales       | 28  | 62000  |
| Frank   | Engineering | 27  | 88000  |
| Bob     | Sales       | 25  | 55000  |
+---------+-------------+-----+--------+
[6 rows x 4 columns]

Multi-Column Sort

Sort by multiple columns with independent sort orders:

sorted, err := df.SortValues(dataframe.SortOptions{
    By:        []string{"Department", "Salary"},
    Ascending: []bool{true, false},  // Dept ascending, Salary descending
})
if err != nil {
    log.Fatalf("Sort failed: %v", err)
}
fmt.Println(sorted.String())

Output

+---------+-------------+-----+--------+
| Name    | Department  | Age | Salary |
+---------+-------------+-----+--------+
| Charlie | Engineering | 35  | 105000 |
| Alice   | Engineering | 30  | 95000  |
| Frank   | Engineering | 27  | 88000  |
| Eve     | Marketing   | 32  | 72000  |
| Diana   | Sales       | 28  | 62000  |
| Bob     | Sales       | 25  | 55000  |
+---------+-------------+-----+--------+
[6 rows x 4 columns]

Multi-Column Sort Flow

flowchart TD
    subgraph Input["Original Data"]
        I[Mixed departments<br/>and salaries]
    end
    
    subgraph Primary["Primary Sort"]
        P[Sort by Department<br/>ascending]
    end
    
    subgraph Secondary["Secondary Sort"]
        S[Within each department<br/>sort by Salary descending]
    end
    
    subgraph Result["Sorted Result"]
        R[Engineering: 105k, 95k, 88k<br/>Marketing: 72k<br/>Sales: 62k, 55k]
    end
    
    I --> P
    P --> S
    S --> R
    
    style Input fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Primary fill:#1e293b,stroke:#f59e0b,stroke-width:2px
    style Secondary fill:#1e293b,stroke:#8b5cf6,stroke-width:2px
    style Result fill:#1e293b,stroke:#22c55e,stroke-width:2px

Ascending Options

The Ascending field provides flexible sort order control:

Configuration	Behavior
Empty `[]bool{}`	All columns ascending
Single value `[]bool{false}`	Applies to all columns
Multiple values	Must match length of `By`

Examples

// All ascending (default)
sorted, _ := df.SortValues(dataframe.SortOptions{
    By: []string{"Age", "Salary"},
    // Ascending not specified - defaults to all true
})

// All descending
sorted, _ := df.SortValues(dataframe.SortOptions{
    By:        []string{"Age", "Salary"},
    Ascending: []bool{false},  // Single value applies to all
})

// Mixed order
sorted, _ := df.SortValues(dataframe.SortOptions{
    By:        []string{"Age", "Salary"},
    Ascending: []bool{true, false},  // Age asc, Salary desc
})

Null Handling

Control where null values appear in sorted results:

// Nulls at the end (default)
sorted, _ := df.SortValues(dataframe.SortOptions{
    By:         []string{"Score"},
    NaPosition: dataframe.NaLast,
})

// Nulls at the beginning
sorted, _ := df.SortValues(dataframe.SortOptions{
    By:         []string{"Score"},
    NaPosition: dataframe.NaFirst,
})

Null Positioning

NaPosition	Ascending Sort	Descending Sort
`NaLast`	`1, 2, 3, nil`	`3, 2, 1, nil`
`NaFirst`	`nil, 1, 2, 3`	`nil, 3, 2, 1`

In-Place Sorting

Modify the DataFrame directly instead of creating a copy:

// Sort in place (modifies original DataFrame)
_, err := df.SortValues(dataframe.SortOptions{
    By:      []string{"Salary"},
    Inplace: true,
})
if err != nil {
    log.Fatalf("Sort failed: %v", err)
}

// df is now sorted, no new DataFrame created
fmt.Println(df.String())

In-Place vs Copy

Option	Returns	Original DataFrame	Use Case
`Inplace: false`	New sorted DataFrame	Unchanged	Need both versions
`Inplace: true`	`nil`	Modified	Save memory

Index Reset

Reset the index to sequential integers after sorting:

sorted, err := df.SortValues(dataframe.SortOptions{
    By:          []string{"Salary"},
    Ascending:   []bool{false},
    IgnoreIndex: true,
})
if err != nil {
    log.Fatalf("Sort failed: %v", err)
}
fmt.Println(sorted.String())

Without IgnoreIndex

+---------+-------------+-----+--------+
| Index   | Name        | Age | Salary |
+---------+-------------+-----+--------+
| 2       | Charlie     | 35  | 105000 |
| 0       | Alice       | 30  | 95000  |
| 5       | Frank       | 27  | 88000  |
+---------+-------------+-----+--------+

With IgnoreIndex

+---------+-------------+-----+--------+
| Index   | Name        | Age | Salary |
+---------+-------------+-----+--------+
| 0       | Charlie     | 35  | 105000 |
| 1       | Alice       | 30  | 95000  |
| 2       | Frank       | 27  | 88000  |
+---------+-------------+-----+--------+

SortIndex

Sorts the DataFrame by its index labels in lexicographic order.

Function Signature

func (df *DataFrame) SortIndex(ascending bool) (*DataFrame, error)

Parameters

Parameter	Type	Description
`ascending`	`bool`	`true` for ascending, `false` for descending

Example

// Set custom index labels
err := df.SetIndex([]string{"f", "b", "e", "a", "d", "c"})
if err != nil {
    log.Fatalf("SetIndex failed: %v", err)
}

fmt.Println("Before sorting by index:")
fmt.Println(df.String())

// Sort by index (ascending)
sorted, err := df.SortIndex(true)
if err != nil {
    log.Fatalf("SortIndex failed: %v", err)
}

fmt.Println("\nAfter sorting by index:")
fmt.Println(sorted.String())

Output

Before sorting by index:
+-------+---------+-------------+-----+--------+
| Index | Name    | Department  | Age | Salary |
+-------+---------+-------------+-----+--------+
| f     | Alice   | Engineering | 30  | 95000  |
| b     | Bob     | Sales       | 25  | 55000  |
| e     | Charlie | Engineering | 35  | 105000 |
+-------+---------+-------------+-----+--------+

After sorting by index:
+-------+---------+-------------+-----+--------+
| Index | Name    | Department  | Age | Salary |
+-------+---------+-------------+-----+--------+
| a     | Diana   | Sales       | 28  | 62000  |
| b     | Bob     | Sales       | 25  | 55000  |
| c     | Frank   | Engineering | 27  | 88000  |
| e     | Charlie | Engineering | 35  | 105000 |
| f     | Alice   | Engineering | 30  | 95000  |
+-------+---------+-------------+-----+--------+

Sort Stability

GPandas uses stable sorting, preserving the relative order of equal elements:

// DataFrame with duplicate ages
// Name: Alice, Bob, Charlie, Diana
// Age:  30,   30,  25,      25

sorted, _ := df.SortValues(dataframe.SortOptions{
    By: []string{"Age"},
})

// Result preserves original order for equal ages:
// Name: Charlie, Diana, Alice, Bob
// Age:  25,      25,    30,    30

Alphabetical Sort

Sort string columns alphabetically:

sorted, err := df.SortValues(dataframe.SortOptions{
    By: []string{"Name"},
})
if err != nil {
    log.Fatalf("Sort failed: %v", err)
}
fmt.Println(sorted.String())

Output

+---------+-------------+-----+--------+
| Name    | Department  | Age | Salary |
+---------+-------------+-----+--------+
| Alice   | Engineering | 30  | 95000  |
| Bob     | Sales       | 25  | 55000  |
| Charlie | Engineering | 35  | 105000 |
| Diana   | Sales       | 28  | 62000  |
| Eve     | Marketing   | 32  | 72000  |
| Frank   | Engineering | 27  | 88000  |
+---------+-------------+-----+--------+
[6 rows x 4 columns]

Sorting Workflow

flowchart TD
    subgraph Input["Original DataFrame"]
        I[Unsorted data]
    end
    
    subgraph Validate["Validation"]
        V1[Check columns exist]
        V2[Validate options]
    end
    
    subgraph Process["Sort Process"]
        P1[Extract column values]
        P2[Build sort indices]
        P3[Apply stable sort]
    end
    
    subgraph Output["Result"]
        O1{Inplace?}
        O2[Modify original]
        O3[Return new DataFrame]
    end
    
    I --> V1
    V1 --> V2
    V2 --> P1
    P1 --> P2
    P2 --> P3
    P3 --> O1
    O1 -->|Yes| O2
    O1 -->|No| O3
    
    style Input fill:#1e293b,stroke:#3b82f6,stroke-width:2px
    style Validate fill:#1e293b,stroke:#f59e0b,stroke-width:2px
    style Process fill:#1e293b,stroke:#8b5cf6,stroke-width:2px
    style Output fill:#1e293b,stroke:#22c55e,stroke-width:2px

Error Handling

Common Errors

Error	Cause	Solution
“DataFrame is nil”	Operating on nil DataFrame	Check DataFrame initialization
“‘By’ must contain at least one column”	Empty `By` slice	Provide at least one column name
“column ‘X’ not found”	Invalid column name	Verify column exists
“length of ‘Ascending’ must match ‘By’”	Mismatched lengths	Match lengths or use single value
“NaPosition must be ’last’ or ‘first’”	Invalid NaPosition	Use `NaLast` or `NaFirst`
“type mismatch: cannot compare”	Mixed types in column	Ensure column has consistent types

Error Handling Example

sorted, err := df.SortValues(dataframe.SortOptions{
    By:        []string{"Department", "Salary"},
    Ascending: []bool{true, false},
})
if err != nil {
    switch {
    case strings.Contains(err.Error(), "not found"):
        log.Fatal("Column doesn't exist in DataFrame")
    case strings.Contains(err.Error(), "must match"):
        log.Fatal("Ascending slice length mismatch")
    case strings.Contains(err.Error(), "type mismatch"):
        log.Fatal("Cannot compare values of different types")
    default:
        log.Fatalf("Sort error: %v", err)
    }
}

Thread Safety

Sorting operations are thread-safe:

Method	Lock Type	Description
`SortValues()`	RLock	Read lock during value extraction
`SortIndex()`	RLock	Read lock during index extraction
In-place sort	Lock	Write lock when modifying original

Concurrent Sorting Example

package main

import (
    "fmt"
    "sync"

    "github.com/apoplexi24/gpandas"
    "github.com/apoplexi24/gpandas/dataframe"
)

func main() {
    gp := gpandas.GoPandas{}
    df, _ := gp.Read_csv("data.csv")
    
    var wg sync.WaitGroup
    
    // Multiple goroutines can sort simultaneously
    for i := 0; i < 3; i++ {
        wg.Add(1)
        go func(id int, col string) {
            defer wg.Done()
            
            // Safe concurrent sort (creates new DataFrame)
            sorted, _ := df.SortValues(dataframe.SortOptions{
                By: []string{col},
            })
            fmt.Printf("Goroutine %d sorted by %s: %d rows\n", 
                id, col, len(sorted.Index))
        }(i, []string{"Age", "Salary", "Name"}[i])
    }
    
    wg.Wait()
}

Complete Example: Data Analysis Pipeline

package main

import (
    "fmt"
    "log"

    "github.com/apoplexi24/gpandas"
    "github.com/apoplexi24/gpandas/dataframe"
)

func main() {
    gp := gpandas.GoPandas{}
    
    // Load employee data
    df, err := gp.Read_csv("employees.csv")
    if err != nil {
        log.Fatalf("Failed to load data: %v", err)
    }
    
    fmt.Println("Original Data:")
    fmt.Println(df.String())
    
    // Step 1: Sort by department and salary
    sorted, err := df.SortValues(dataframe.SortOptions{
        By:        []string{"Department", "Salary"},
        Ascending: []bool{true, false},
    })
    if err != nil {
        log.Fatalf("Sort failed: %v", err)
    }
    
    fmt.Println("\nSorted by Department (asc) and Salary (desc):")
    fmt.Println(sorted.String())
    
    // Step 2: Select top earners per department
    topEarners, _ := sorted.Select("Name", "Department", "Salary")
    
    // Step 3: Export results
    _, err = topEarners.ToCSV("top_earners.csv", ",")
    if err != nil {
        log.Printf("Export warning: %v", err)
    }
    
    fmt.Println("\nExported to top_earners.csv")
}

Performance Considerations

Scenario	Performance	Recommendation
Small DataFrames (<1000 rows)	Fast	Use any sort configuration
Large DataFrames (>10000 rows)	Moderate	Consider `Inplace: true` to save memory
Multi-column sort	Slower than single	Minimize number of sort columns
String columns	Slower than numeric	Use numeric keys when possible

Sorting Data

Overview link

SortValues link

Function Signature link

SortOptions link

NaPosition Constants link

Supported Types link

Sample Data link

Employees DataFrame link

Setup Code link

Single Column Sort (Ascending) link

Example link

Output link

Single Column Sort (Descending) link

Output link

Multi-Column Sort link

Output link

Multi-Column Sort Flow link

Ascending Options link

Examples link

Null Handling link

Null Positioning link

In-Place Sorting link

In-Place vs Copy link

Index Reset link

Without IgnoreIndex link

With IgnoreIndex link

SortIndex link

Function Signature link

Parameters link

Example link

Output link

Sort Stability link

Alphabetical Sort link

Output link

Sorting Workflow link

Error Handling link

Common Errors link

Error Handling Example link

Thread Safety link

Concurrent Sorting Example link

Complete Example: Data Analysis Pipeline link

Performance Considerations link

See Also link

Overview

SortValues

Function Signature

SortOptions

NaPosition Constants

Supported Types

Sample Data

Employees DataFrame

Setup Code

Single Column Sort (Ascending)

Example

Output

Single Column Sort (Descending)

Output

Multi-Column Sort

Output

Multi-Column Sort Flow

Ascending Options

Examples

Null Handling

Null Positioning

In-Place Sorting

In-Place vs Copy

Index Reset

Without IgnoreIndex

With IgnoreIndex

SortIndex

Function Signature

Parameters

Example

Output

Sort Stability

Alphabetical Sort

Output

Sorting Workflow

Error Handling

Common Errors

Error Handling Example

Thread Safety

Concurrent Sorting Example

Complete Example: Data Analysis Pipeline

Performance Considerations

See Also