Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix header normalization for xlsx #612

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added plugins/xlsx-extractor/ref/test-headers.xlsx
Binary file not shown.
129 changes: 129 additions & 0 deletions plugins/xlsx-extractor/src/header.normalization.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
import api from '@flatfile/api'
import {
setupListener,
setupSpace,
getEnvironmentId,
} from '@flatfile/utils-testing'
import { ExcelExtractor } from '.'
import fs from 'fs'
import path from 'path'
import { FlatfileEvent } from '@flatfile/listener'

describe('xlsx-extractor plugin', () => {

const listener = setupListener()
let spaceId: string

beforeAll(async () => {
const space = await setupSpace()
spaceId = space.id
})
afterAll(async () => {
await api.spaces.delete(spaceId)
})

beforeEach(async () => {
listener.use(ExcelExtractor())
})

it('Upload file with headers that require normalization', async () => {

listener.on("**", async (event: FlatfileEvent) => {
console.log(event.topic)
})

await api.files.upload(fs.createReadStream(path.join(__dirname,'../ref/test-headers.xlsx')), {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullify Code Language: TypeScript 🔵 MEDIUM Severity CWE-22

Javascript pathtraversal rule non literal fs filename

The application dynamically constructs file or path information. If the path
information comes from user-supplied input, it could be abused to read sensitive files,
access other users' data, or aid in exploitation to gain further system access.

User input should never be used in constructing paths or files for interacting
with the filesystem. This includes filenames supplied by user uploads or downloads.
If possible, consider hashing user input or using unique values and
use path.normalize to resolve and validate the path information
prior to processing any file functionality.

Example using path.normalize and not allowing direct user input:

// User input, saved only as a reference
// id is a randomly generated UUID to be used as the filename
const userData = {userFilename: userSuppliedFilename, id: crypto.randomUUID()};
// Restrict all file processing to this directory only
const basePath = '/app/restricted/';

// Create the full path, but only use our random generated id as the filename
const joinedPath = path.join(basePath, userData.id);
// Normalize path, removing any '..'
const fullPath = path.normalize(joinedPath);
// Verify the fullPath is contained within our basePath
if (!fullPath.startsWith(basePath)) {
    console.log("Invalid path specified!");
}
// Process / work with file
// ...

For more information on path traversal issues see OWASP:
https://owasp.org/www-community/attacks/Path_Traversal

Here's how you might fix this potential vulnerability

The modified code first constructs the full path of the file, then uses 'path.basename()' to get the filename. This ensures that only the filename is used in the filesystem operation, preventing directory traversal. Even though '__dirname' is not controllable by an attacker, it's a good practice to always validate or sanitize filenames in filesystem operations.

autoFixesExperimental

Use path.basename() to get the filename and prevent directory traversal

Suggested change
await api.files.upload(fs.createReadStream(path.join(__dirname,'../ref/test-headers.xlsx')), {
const filePath = path.join(__dirname, '../ref/test-headers.xlsx');
const fileName = path.basename(filePath);
await api.files.upload(fs.createReadStream(filePath), {

poweredByNullify

Reply with /nullify to interact with me like another developer
(you will need to refresh the page for updates)

environmentId: getEnvironmentId(),
spaceId,
})

const failure = async () => {
await listener.waitFor("job:failed", 1)
return false
}
const success = async () => {
await listener.waitFor("sheet:counts-updated", 3)
return true
}

const ok = await Promise.race([failure(), success()])
if(!ok) {
throw new Error("Job should not fail")
} else {
const { data: workbooks } = await api.workbooks.list({spaceId})
expect(workbooks.length).toBe(1)
const { data: sheets } = await api.sheets.list({workbookId: workbooks[0].id})
expect(sheets.length).toBe(1)
const EXPECTED_FIELDS = [{
"description": "",
"key": "Code",
"label": "Code",
"type": "string",
},
{
"description": "",
"key": "Amount_DOLLAR_",
"label": "Amount_DOLLAR_",
"type": "string",
},
{
"description": "",
"key": "Amount_DOLLAR__1",
"label": "Amount_DOLLAR__1",
"type": "string",
},
{
"description": "",
"key": "Rate_PERCENT_",
"label": "Rate_PERCENT_",
"type": "string",
},
{
"description": "",
"key": "empty",
"label": "empty",
"type": "string",
},
{
"description": "",
"key": "empty_1",
"label": "empty_1",
"type": "string",
}]

expect(sheets[0].config.fields).toEqual(EXPECTED_FIELDS)

const { data: { records } } = await api.records.get(sheets[0].id)
expect(records.length).toBe(2)
const data = records.map((record) =>
EXPECTED_FIELDS.reduce((acc, field) => {
acc[field.key] = record.values[field.key].value
return acc
}, {})
)
expect(data).toEqual([
{
"Amount_DOLLAR_": "100",
"Amount_DOLLAR__1": "300",
"Code": "ABC",
"Rate_PERCENT_": "5%",
"empty": undefined,
"empty_1": undefined,
},
{
"Amount_DOLLAR_": "200",
"Amount_DOLLAR__1": "400",
"Code": "DEF",
"Rate_PERCENT_": "3%",
"empty": undefined,
"empty_1": undefined,
},
])

}

})

})


8 changes: 6 additions & 2 deletions plugins/xlsx-extractor/src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ export function prependNonUniqueHeaderColumns(
const result: Record<string, string> = {}

for (const [key, value] of Object.entries(record)) {
const newValue = value ? value : 'empty'
const cleanValue = newValue.replace('*', '')
const newValue = value || 'empty'
const cleanValue = normalizeKey(newValue.replace('*', ''))

if (cleanValue && counts[cleanValue]) {
result[key] = `${cleanValue}_${counts[cleanValue]}`
Expand All @@ -19,3 +19,7 @@ export function prependNonUniqueHeaderColumns(

return result
}

function normalizeKey(key: string): string {
return key.trim().replace(/%/g, '_PERCENT_').replace(/\$/g, '_DOLLAR_').replace(/[^a-zA-Z0-9]/g, "_")
}
2 changes: 1 addition & 1 deletion utils/extractor/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ function getSheetConfig(
}

function normalizeKey(key: string): string {
return key.trim().replace(/%/g, '_PERCENT_').replace(/\$/g, '_DOLLAR_')
return key.trim().replace(/%/g, '_PERCENT_').replace(/\$/g, '_DOLLAR_').replace(/[^a-zA-Z0-9]/g, "_")
}

function normalizeRecordKeys(record: Flatfile.RecordData): Flatfile.RecordData {
Expand Down
Loading