Group by multiple data sources in a declarative way
2 min readApr 1, 2024
Introduction
Grouping data by a common
value or within a range
, such as age, salary, city, has a lot of use cases: statistics (distribution), data visualization (histogram, pie chart), data analytics (bubble map), machine learning, cryptanalysis, etc.
What we have
Let’s imagine the upload of files in batch on the client-side (browser) through an API and the response by that API keeping the items’ order as in the payload.
const payload = [
{ name: 'institute.jpeg', blob: 'O<{t³]Û9Î!hÖ' },
{ name: 'cv.pdf', blob: 'ðT®@cÏ3æ:®ÊÙLà`' },
{ name: 'duckling.jpeg', blob: 'ß&ßPI³Oâ-$x' },
{ name: 'car.jpeg', data: 'ôfU?0ãh,Ã*1' },
{ name: 'railways.jpeg', blob: 'He¤2KµE1ª' },
{ name: 'building.jpeg', blob: 'K¤xäPά¹mØ pyÀ' },
{ name: 'song.mp3', blob: '»VM+V5AÄ)c¹<01´xñÖFUÁ5[z' },
{ name: 'dictionary.exe', blob: 'ÐMþµ«¥×Méµ=¿äõ' },
];
const response = [
{ ok: false, error: 'ErrorDuplicateNameOutput' },
{ ok: false, error: 'ErrorUnsupporrtedOutput' },
{ ok: false, error: 'ErrorDuplicateNameOutput' },
{ ok: true },
{ ok: false, error: 'ErrorAccessDeniedOutput' },
{ ok: true },
{ ok: false, error: 'ErrorUnsupporrtedOutput' },
{ ok: false, error: 'ErrorUnsupporrtedOutput' },
];
What we want
const groupedByErrors = {
ErrorDuplicateNameOutput: ['institute.jpeg', 'duckling.jpeg'],
ErrorAccessDeniedOutput: ['railways.jpeg'],
ErrorUnsupporrtedOutput: ['cv.pdf', 'song.mp3', 'dictionary.exe']
};
Imperative solution
const groupedByErrors = response.reduce((acc, curr, i) => {
// error occured
if(payload?.[i] && !curr.ok){
// key does exist
if(curr.error in acc){
return {...acc, [curr.error]: [...acc[curr.error], payload[i].name]};
}
else{
return {...acc, [curr.error]: [payload[i].name]};
}
}
else{
return acc;
}
}, {});
Declarative solution
import { flow, filter, groupBy, mapValues, zip } from 'lodash';
const groupedByErrors = flow(
combos => filter(combos, ([, response]) => !response.ok),
combos => groupBy(combos, ([, response]) => response.error),
combos => mapValues(combos, combo => combo.map(([payload]) => payload.name))
)(zip(payload, response));
Takeaway
The declarative solution has a couple of benefits:
- Easier to read:
combine
elements at the same position in both arrays into an array,keep
the errors only,group by
error type,pick
the desired field as value - Higher level: no implementation
detail
(loop, accumulator, index) - More expressive: function compositions (no if statement)
- Single responsibility principle: dedicated functions doing a specific task