Make requests to fetch a project's author's info concurrently #6

Closed
opened 2026-01-20 21:04:26 +00:00 by aniram · 4 comments
aniram commented 2026-01-20 21:04:26 +00:00 (Migrated from codeberg.org)

Currently fetching a project(level 1), its author(level 2) and the author's party(level 3) is taking 9.90 seconds(query: São Paulo, results: 15) on Firefox because all those requests are happening synchronously. The requests from level 1 to 2 could be all concurrent.

I know beforehand how many requests I will make from level 1 to 2 since I'm only showing 15 projects max. per page.
Therefore I could iterate over each of those projects and make the request to fetch the author concurrently for each item of the loop. From level 2 to 3 I don't really have an option since I need to know the author's ID to fetch information on their party but optimizing from level 1 to 2 would still make the page much faster.

Let's examine a simple scenario leaving level 3 out:
If I have N projects and M authors in total the page will only load after N+M requests.
But running M requests concurrently the page would load after N+1 requests.

I expect the load time to halve. Nevertheless further optimizations must be made from level 2 to 3.
There are two issues I have noticed so far:

  1. I need many requests to fetch a decent amount of information on a resource like a project's author.
  2. I often need all data about a certain resource gathered on a single model but the API has the data scattered, to get around this issue I have for example added the property CIDAuthors to the model Project although a project on the API does not contain this property. I'm not sure yet if this is the best solution.

I have considered a few things to circumvent them.
For example downloading the API's tables, transforming it into a SQL database with the structure as I need it and fetching information directly from my DB or another possibility would be to download the tables, create the DB with the structure as they are and fetch only the data from the DB that is unlikely to change soon or ever like a member's name, their picture, a project's description etc.
The former "solution" would probably cause many problems as the API changes, and also as I learn about my misconceptions of the API and the project's domain.

Currently fetching a project(level 1), its author(level 2) and the author's party(level 3) is taking 9.90 seconds(query: São Paulo, results: 15) on Firefox because all those requests are happening synchronously. The requests from level 1 to 2 could be all concurrent. I know beforehand how many requests I will make from level 1 to 2 since I'm only showing 15 projects max. per page. Therefore I could iterate over each of those projects and make the request to fetch the author concurrently for each item of the loop. From level 2 to 3 I don't really have an option since I need to know the author's ID to fetch information on their party but optimizing from level 1 to 2 would still make the page much faster. Let's examine a simple scenario leaving level 3 out: If I have N projects and M authors in total the page will only load after N+M requests. But running M requests concurrently the page would load after N+1 requests. I expect the load time to halve. Nevertheless further optimizations must be made from level 2 to 3. There are two issues I have noticed so far: 1. I need many requests to fetch a decent amount of information on a resource like a project's author. 2. I often need all data about a certain resource gathered on a single model but the API has the data scattered, to get around this issue I have for example added the property `CIDAuthors` to the model `Project` although a project on the API does not contain this property. I'm not sure yet if this is the best solution. I have considered a few things to circumvent them. For example downloading the API's tables, transforming it into a SQL database with the structure as I need it and fetching information directly from my DB or another possibility would be to download the tables, create the DB with the structure as they are and fetch only the data from the DB that is unlikely to change soon or ever like a member's name, their picture, a project's description etc. The former "solution" would probably cause many problems as the API changes, and also as I learn about my misconceptions of the API and the project's domain.
aniram commented 2026-01-24 23:16:05 +00:00 (Migrated from codeberg.org)

The first draft(commit 34079647f4dc02cd) reduced the page load time by half for queries like 'São Paulo' but it's still about one 1 second on average slower than the official website. It also does not use mutexes nor does it range over the channel's values yet, it does use buffered channels though. The next draft should contain mutexes and range over the values properly. Ideally I can trim it down by one more second.

The first draft(commit ` 34079647f4dc02cd `) reduced the page load time by half for queries like 'São Paulo' but it's still about one 1 second on average slower than the official website. It also does not use mutexes nor does it range over the channel's values yet, it does use buffered channels though. The next draft should contain mutexes and range over the values properly. Ideally I can trim it down by one more second.
aniram commented 2026-01-25 21:32:24 +00:00 (Migrated from codeberg.org)

I thought of iterating over the channel's values but it wasn't possible because doing it requires closing the channel. If I do close the channel manually(infinite range) or autom.(range over channel) then the program will panic since the goroutines might be still about to send some value and eventually do so. What I did instead which decreased the load time about 44ms was to run the goroutine that would fetch an author's information inside the goroutine that fetch all authors.

I decided deliberately not to use WaitGroups because although they would make it clearer when I'm waiting for my goroutines to finish instead of looping through all projects, they would just add code overhead and not necessarily improve anything. I would definitely have to use them if I did not know in advance how many results I was expecting from all goroutines, but this is not the case since amount of results = amount of goroutines = amount of projects as each goroutine only sends one value over.

Mutex is next but I fear the page load's time cannot be decreased any further.

I thought of iterating over the channel's values but it wasn't possible because doing it requires closing the channel. If I do close the channel manually(infinite range) or autom.(range over channel) then the program will panic since the goroutines might be still about to send some value and eventually do so. What I did instead which decreased the load time about 44ms was to run the goroutine that would fetch an author's information inside the goroutine that fetch all authors. I decided deliberately not to use WaitGroups because although they would make it clearer when I'm waiting for my goroutines to finish instead of looping through all projects, they would just add code overhead and not necessarily improve anything. I would definitely have to use them if I did not know in advance how many results I was expecting from all goroutines, but this is not the case since amount of results = amount of goroutines = amount of projects as each goroutine only sends one value over. Mutex is next but I fear the page load's time cannot be decreased any further.
aniram commented 2026-01-25 21:50:31 +00:00 (Migrated from codeberg.org)

Food for thought:
i) Use in-memory cache(redis, memcached) for user's favorites.
ii) Use a caching layer -> on a dedicated service retrieve projects, authors and related data from the API as CSV files*, transform and store them as my own entity with the properties as I need them(aggregate information scattered across requests into one entity) to my DB and fetch from there. Depending how fresh the data needs to be, retrieve from the API accordingly.

*their website reclaims they would provide CSV files updated daily with whole tables. I could use those files instead of making requests timed by a cronjob which could overload their system unnecessarily.

Food for thought: i) Use in-memory cache(redis, memcached) for user's favorites. ii) Use a caching layer -> on a dedicated service retrieve projects, authors and related data from the API as CSV files*, transform and store them as my own entity with the properties as I need them(aggregate information scattered across requests into one entity) to my DB and fetch from there. Depending how fresh the data needs to be, retrieve from the API accordingly. *their website reclaims they would provide CSV files updated daily with whole tables. I could use those files instead of making requests timed by a cronjob which could overload their system unnecessarily.
aniram added reference fix/slow-page-load-due-to-authors-info-requests 2026-01-27 22:24:10 +00:00

@aniram wrote in https://git.marina.sh/aniram/cidadon/issues/6#issuecomment-5:

I thought of iterating over the channel's values but it wasn't possible because doing it requires closing the channel. If I do close the channel manually(infinite range) or autom.(range over channel) then the program will panic since the goroutines might be still about to send some value and eventually do so. What I did instead which decreased the load time about 44ms was to run the goroutine that would fetch an author's information inside the goroutine that fetch all authors.

I decided deliberately not to use WaitGroups because although they would make it clearer when I'm waiting for my goroutines to finish instead of looping through all projects, they would just add code overhead and not necessarily improve anything. I would definitely have to use them if I did not know in advance how many results I was expecting from all goroutines, but this is not the case since amount of results = amount of goroutines = amount of projects as each goroutine only sends one value over.

Mutex is next but I fear the page load's time cannot be decreased any further.

Using mutex made the code clearer and more intuitive but it seems like it also added a few 1/100 of milliseconds to the page's load time. I guess that's the price of readability.

@aniram wrote in https://git.marina.sh/aniram/cidadon/issues/6#issuecomment-5: > I thought of iterating over the channel's values but it wasn't possible because doing it requires closing the channel. If I do close the channel manually(infinite range) or autom.(range over channel) then the program will panic since the goroutines might be still about to send some value and eventually do so. What I did instead which decreased the load time about 44ms was to run the goroutine that would fetch an author's information inside the goroutine that fetch all authors. > > I decided deliberately not to use WaitGroups because although they would make it clearer when I'm waiting for my goroutines to finish instead of looping through all projects, they would just add code overhead and not necessarily improve anything. I would definitely have to use them if I did not know in advance how many results I was expecting from all goroutines, but this is not the case since amount of results = amount of goroutines = amount of projects as each goroutine only sends one value over. > > Mutex is next but I fear the page load's time cannot be decreased any further. Using mutex made the code clearer and more intuitive but it seems like it also added a few 1/100 of milliseconds to the page's load time. I guess that's the price of readability.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aniram/cidadon#6
No description provided.