Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
mysql
python
emr
aws
data-science
lambda
aws-lambda
athena
etl
pandas
data-engineering
redshift
apache-parquet
amazon-athena
apache-arrow
aws-glue
glue-catalog
amazon-sagemaker-notebook
-
Updated
Oct 14, 2021 - Python
It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.