Group by multiple data sources in a declarative way

Thomas Rubattel
2 min readApr 1, 2024
Group by country in a bubble map — Credits KOBU Agency on Unsplash

Introduction

Grouping data by a common value or within a range, such as age, salary, city, has a lot of use cases: statistics (distribution), data visualization (histogram, pie chart), data analytics (bubble map), machine learning, cryptanalysis, etc.

What we have

Let’s imagine the upload of files in batch on the client-side (browser) through an API and the response by that API keeping the items’ order as in the payload.

const payload = [
{ name: 'institute.jpeg', blob: 'O<{t³­]Û9Î!hÖ' },
{ name: 'cv.pdf', blob: 'ðT®@cÏ3æ:®ÊÙLà`' },
{ name: 'duckling.jpeg', blob: 'ß&ßPI³Oâ-$x' },
{ name: 'car.jpeg', data: 'ôfU?0ãh,ß*1' },
{ name: 'railways.jpeg', blob: 'Heš¤2Kµ’E1ª' },
{ name: 'building.jpeg', blob: 'K¤‚žxäPÎ’‚¬¹mØ pyÀ' },
{ name: 'song.mp3', blob: '’»VM+V5Aď)c¹›<ƒœ‚01´xñÖFUÁ5[zˆ' },
{ name: 'dictionary.exe', blob: '‘ÐMþµ«¥‚×Méµ=¿äõ‰' },
];

const response = [
{ ok: false, error: 'ErrorDuplicateNameOutput' },
{ ok: false, error: 'ErrorUnsupporrtedOutput' },
{ ok: false, error: 'ErrorDuplicateNameOutput' },
{ ok: true },
{ ok: false, error: 'ErrorAccessDeniedOutput' },
{ ok: true },
{ ok: false, error: 'ErrorUnsupporrtedOutput' },
{ ok: false, error: 'ErrorUnsupporrtedOutput' },
];

What we want

const groupedByErrors = {
ErrorDuplicateNameOutput: ['institute.jpeg', 'duckling.jpeg'],
ErrorAccessDeniedOutput: ['railways.jpeg'],
ErrorUnsupporrtedOutput: ['cv.pdf', 'song.mp3', 'dictionary.exe']
};

Imperative solution

const groupedByErrors = response.reduce((acc, curr, i) => {
// error occured
if(payload?.[i] && !curr.ok){
// key does exist
if(curr.error in acc){
return {...acc, [curr.error]: [...acc[curr.error], payload[i].name]};
}
else{
return {...acc, [curr.error]: [payload[i].name]};
}
}
else{
return acc;
}
}, {});

Declarative solution

import { flow, filter, groupBy, mapValues, zip } from 'lodash';

const groupedByErrors = flow(
combos => filter(combos, ([, response]) => !response.ok),
combos => groupBy(combos, ([, response]) => response.error),
combos => mapValues(combos, combo => combo.map(([payload]) => payload.name))
)(zip(payload, response));

Takeaway

The declarative solution has a couple of benefits:

  • Easier to read: combine elements at the same position in both arrays into an array, keep the errors only, group by error type, pick the desired field as value
  • Higher level: no implementation detail (loop, accumulator, index)
  • More expressive: function compositions (no if statement)
  • Single responsibility principle: dedicated functions doing a specific task

--

--