The Wayback Machine - https://web.archive.org/web/20201013035325/https://github.com/cube-js/cube.js/issues/457
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GRANULARITY in context variables #457

Open
diogo-andrade opened this issue Feb 28, 2020 · 5 comments
Open

Support GRANULARITY in context variables #457

diogo-andrade opened this issue Feb 28, 2020 · 5 comments
Labels

Comments

@diogo-andrade
Copy link

@diogo-andrade diogo-andrade commented Feb 28, 2020

Currently, roughly speaking we can define schemas that accesses the filter date range, during the SQL generation time, through FILTER_PARAMS.<CUBE_NAME>.<TIME_TYPE_FILTER_NAME>.filter(expression). This allows us to define subqueries filtered by the date range coming from a request. As in the following example:

  segments: {   
    fiveChatsExpert: {
      sql: `${expertId} IN (
        SELECT expert_id
        FROM ${CUBE} INNER JOIN ${Partners} ON ${partnerId} = ${Partners.id}
        WHERE ${FILTER_PARAMS.Partners.externalId.filter(`${Partners.externalId}`)}
        AND ${FILTER_PARAMS.ChatsStats.createdAt.filter(`${createdAtConverted}`)}
        GROUP BY 1
        HAVING count(expert_id) > 4
        )`,
    },
  }

When granularity is present in a coming request the final SELECT it's grouped by that granularity, as expected. However, if we want to use that grouping in a subquery, like the one in the example above, is not possible.
To see what this limitation affects, lets consider the query A, and the following cube:

cube('ChatsStats', {
  sql: 'select * from public.chats_stats',
  joins: {
    ...all joins well defined...
  }
  measures: {
  ...all measures well defined...
  }
  dimensions: {
  ...all dimensions well defined...
  }

  segments: {   
    fiveChatsExpert: {
      sql: `${expertId} IN (
        SELECT expert_id
        FROM ${CUBE} INNER JOIN ${Partners} ON ${partnerId} = ${Partners.id}
        WHERE ${FILTER_PARAMS.Partners.externalId.filter(`${Partners.externalId}`)}
        AND ${FILTER_PARAMS.ChatsStats.createdAt.filter(`${createdAtConverted}`)}
        GROUP BY 1
        HAVING count(expert_id) > 4
        )`,
    },
  }
});

Query A:
Give me the experts Id with 5 or more chats made from Partner X for this year
Chats Stats Expert Id by Chats Stats Five Chats Expert and Partners External Id = X and Chats Stats Created At this Year

Will generate,

SELECT
  "chats_stats".expert_id "chats_stats__expert_id"
FROM
  public.chats_stats AS "chats_stats"
  LEFT JOIN public.partners AS "partners" ON "chats_stats".partner_id = "partners".id
WHERE
  (
    "chats_stats".created_at >= $ 1 :: timestamptz
    AND "chats_stats".created_at <= $ 2 :: timestamptz
  )
  AND (
    "chats_stats".expert_id IN (
      SELECT
        expert_id
      FROM
        "chats_stats"
        INNER JOIN "partners" ON "chats_stats".partner_id = "partners".id
      WHERE
        "partners".external_id = $ 3
        AND (
          "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
        ) >= $ 4 :: timestamptz
        AND (
          "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
        ) <= $ 5 :: timestamptz
      GROUP BY
        1
      HAVING
        count(expert_id) > 4
    )
  )
  AND ("partners".external_id = $ 6)
GROUP BY
  1
ORDER BY
  1 ASC
LIMIT
  10000

which will return the correct output. But, if we add 'month' granularity as example, will generate the following SQL:

SELECT
  "chats_stats".expert_id "chats_stats__expert_id",
  date_trunc(
    'month',
    (
      "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
    )
  ) "chats_stats__created_at_month"
FROM
  public.chats_stats AS "chats_stats"
  LEFT JOIN public.partners AS "partners" ON "chats_stats".partner_id = "partners".id
WHERE
  (
    "chats_stats".created_at >= $ 1 :: timestamptz
    AND "chats_stats".created_at <= $ 2 :: timestamptz
  )
  AND (
    "chats_stats".expert_id IN (
      SELECT
        expert_id
      FROM
        "chats_stats"
        INNER JOIN "partners" ON "chats_stats".partner_id = "partners".id
      WHERE
        "partners".external_id = $ 3
        AND (
          "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
        ) >= $ 4 :: timestamptz
        AND (
          "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
        ) <= $ 5 :: timestamptz
      GROUP BY
        1
      HAVING
        count(expert_id) > 4
    )
  )
  AND ("partners".external_id = $ 6)
GROUP BY
  1,
  2
ORDER BY
  2 ASC
LIMIT
  10000

which will not answer correctly the query "Give me the experts Id with 5 or more chats made from Partner X for this year and grouped by Month".
In order to answer the aforementioned query correctly, it should be possible to write fiveChatsExpert segment in the following way:

    fiveChatsExpert: {
      sql: `(${expertId},  date_trunc(`${GRANULARITY}`, `${createdAtConverted}`) IN (
        SELECT expert_id, date_trunc(`${GRANULARITY}`, `${createdAtConverted}`)
        FROM ${CUBE} INNER JOIN ${Partners} ON ${partnerId} = ${Partners.id}
        WHERE ${FILTER_PARAMS.Partners.externalId.filter(`${Partners.externalId}`)}
        AND ${FILTER_PARAMS.ChatsStats.createdAt.filter(`${createdAtConverted}`)}
        GROUP BY 1, 2
        HAVING count(expert_id) > 4
        )`,
    },

so that the following SQL be generated,

SELECT
  "chats_stats".expert_id "chats_stats__expert_id",
  date_trunc(
    'month',
    (
      "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
    )
  ) "chats_stats__created_at_month"
FROM
  public.chats_stats AS "chats_stats"
  LEFT JOIN public.partners AS "partners" ON "chats_stats".partner_id = "partners".id
WHERE
  (
    "chats_stats".created_at >= $ 1 :: timestamptz
    AND "chats_stats".created_at <= $ 2 :: timestamptz
  )
  AND (
    ("chats_stats".expert_id,
    date_trunc('month',
     (
       "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
     )
  ) 
  IN (
      SELECT
        expert_id, 
        date_trunc('month',
          (
           "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
          )
         ) 
      FROM
        "chats_stats"
        INNER JOIN "partners" ON "chats_stats".partner_id = "partners".id
      WHERE
        "partners".external_id = $ 3
        AND (
          "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
        ) >= $ 4 :: timestamptz
        AND (
          "chats_stats".created_at :: timestamptz AT TIME ZONE 'UTC'
        ) <= $ 5 :: timestamptz
      GROUP BY
        1, 2
      HAVING
        count(expert_id) > 4
    )
  )
  AND ("partners".external_id = $ 6)
GROUP BY
  1,
  2
ORDER BY
  2 ASC
LIMIT
  10000

In this way, my suggestion is to have the granularity access through GRANULARITY context variables, or through FILTER_PARAMS.<CUBE_NAME>.<TIME_TYPE_FILTER_NAME>.granularity.

I believe that there are many other scenarios where access to granularity would be helpful.

@paveltiunov
Copy link
Contributor

@paveltiunov paveltiunov commented Feb 29, 2020

@diogo-andrade Hey Diogo! Thanks for posting this one! Could you please elaborate a little bit on your business use case? From the first sight your case is a good candidate for https://cube.dev/docs/subquery#top.

@diogo-andrade
Copy link
Author

@diogo-andrade diogo-andrade commented Mar 3, 2020

Consider the following Calls table,

 id | user  | created_at              | points
----+-------+-------------------------+-------
  1 | Paul  | 2020-01-27 14:44:36.425 | 52
  2 | Allen | 2020-01-27 09:30:10.000 | 89
  3 | Paul  | 2020-01-28 12:44:36.211 | 40
  4 | Paul  | 2020-01-28 14:34:21.234 | 60
  5 | Paul  | 2020-01-28 15:46:43.123 | 64
  6 | David | 2020-01-28 15:46:43.025 | 22
  7 | David | 2020-01-28 16:44:36.135 | 18
  8 | David | 2020-01-28 16:59:01.415 | 77
  9 | Frank | 2020-01-28 17:32:16.398 | 44
 10 | Allen | 2020-01-29 18:29:27.395 | 64

and Success Ratio = (total users with at least 2 calls performed and average points >= 50 / total users with at least 2 calls performed).

What is expected is that be possible to calculate the Success Ratio for a selected period and granularity, and present the result in a bar chart.

Therefore, if last week of January is selected with day granularity, following the Success Ratio formula, we expect the following result per day:
Jan 27 - 0
Jan 28 - 0.5
Jan 29 - 0
And for the same period but with week granularity selected:
Jan 27 (first day of the week) - 0.67

Let me know if this is enough.

@paveltiunov
Copy link
Contributor

@paveltiunov paveltiunov commented Mar 3, 2020

@diogo-andrade Makes sense. Do you want to control for which period of time you expect to see at least 2 calls? Or you want it to be always aligned with selected interval of time?

@diogo-andrade
Copy link
Author

@diogo-andrade diogo-andrade commented Mar 3, 2020

I want it to be always aligned with the selected date range.

@paveltiunov
Copy link
Contributor

@paveltiunov paveltiunov commented Apr 24, 2020

@diogo-andrade There's propagateFiltersToSubQuery option in preview right now:

cube(`Expert`, {
  // ...

  dimensions: {
    chatsCount: {
      sql: `${Chats.count}`,
      type: `number`,
      subQuery: true,
      propagateFiltersToSubQuery: true
    }
  }
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.