Full Code of facebookresearch/personal-timeline for AI

main ff74564b0548 cached

244 files

67.8 MB

7.6M tokens

358 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (30,239K chars total). Download the full file to get everything.

Repository: facebookresearch/personal-timeline
Branch: main
Commit: ff74564b0548
Files: 244
Total size: 67.8 MB

Directory structure:
gitextract_2gnihnj2/

├── .gitignore
├── .gitmodules
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DATASET.md
├── LICENSE
├── NEW_DATASOURCE.md
├── README.md
├── conf/
│   └── ingest.conf
├── docker-compose.yml
├── notebooks/
│   ├── extract_narration_tutorial.ipynb
│   ├── object_detection_tutorial.ipynb
│   └── ocr_tutorial.ipynb
├── sample_data/
│   ├── books.csv
│   ├── books.sampled.csv
│   ├── books.sampled.json
│   ├── config.ini
│   ├── create_db.sql
│   ├── episodes.csv
│   ├── episodes.json
│   ├── exercise.sampled.csv
│   ├── exercise.sampled.json
│   ├── photos.sampled.csv
│   ├── photos.sampled.json
│   ├── places.sampled.csv
│   ├── places.sampled.json
│   ├── purchase.sampled.csv
│   ├── purchase.sampled.json
│   ├── streaming.sampled.csv
│   ├── streaming.sampled.json
│   ├── trips.sampled.csv
│   ├── trips.sampled.json
│   ├── views_idx.csv
│   └── views_metadata.txt
└── src/
    ├── __init__.py
    ├── common/
    │   ├── __init__.py
    │   ├── bootstrap/
    │   │   └── data_source.json
    │   ├── generate_persona.py
    │   ├── geo_helper.py
    │   ├── objects/
    │   │   ├── EntryTypes.py
    │   │   ├── LLEntry_obj.py
    │   │   ├── __init__.py
    │   │   ├── derive_attributes.py
    │   │   └── import_configs.py
    │   ├── persistence/
    │   │   ├── __init__.py
    │   │   ├── key_value_db.py
    │   │   └── personal_data_db.py
    │   ├── user_info.json
    │   └── util.py
    ├── frontend/
    │   ├── Dockerfile
    │   ├── README.md
    │   ├── __init__.py
    │   ├── package.json
    │   ├── public/
    │   │   ├── index.html
    │   │   ├── manifest.json
    │   │   └── robots.txt
    │   ├── requirements.txt
    │   └── src/
    │       ├── App.css
    │       ├── App.js
    │       ├── App.test.js
    │       ├── Constants.js
    │       ├── index.css
    │       ├── index.js
    │       ├── map/
    │       │   ├── GoogleMapComponent.js
    │       │   └── PlaceInfo.js
    │       ├── reportWebVitals.js
    │       ├── service/
    │       │   └── DigitalDataImportor.js
    │       ├── setupTests.js
    │       └── timeline/
    │           ├── EpiTimeline.js
    │           ├── builders.js
    │           ├── constants.js
    │           └── utils.js
    ├── ingest/
    │   ├── Dockerfile
    │   ├── __init__.py
    │   ├── create_episodes.py
    │   ├── derive_episodes.py
    │   ├── enrichment/
    │   │   ├── __init__.py
    │   │   ├── find_jpegs.py
    │   │   ├── geo_enrichment.py
    │   │   ├── image_deduplication.py
    │   │   ├── image_enrichment.py
    │   │   └── socratic/
    │   │       ├── README.md
    │   │       ├── __init__.py
    │   │       ├── process.ipynb
    │   │       ├── prompts/
    │   │       │   ├── __init__.py
    │   │       │   ├── categories_places365.txt
    │   │       │   ├── extract_text_features.py
    │   │       │   ├── openimage-classnames.csv
    │   │       │   ├── place365-classnames.txt
    │   │       │   ├── tencent-ml-classnames.txt
    │   │       │   └── tencent-ml-images.txt
    │   │       ├── requirements.txt
    │   │       └── socratic.py
    │   ├── export/
    │   │   ├── __init__.py
    │   │   └── export_entities.py
    │   ├── importers/
    │   │   ├── __init__.py
    │   │   ├── create_amazon_LLEntries.py
    │   │   ├── create_apple_health_LLEntries.py
    │   │   ├── create_facebook_LLEntries.py
    │   │   ├── create_google_photo_LLEntries.py
    │   │   ├── create_googlemaps_LLEntries.py
    │   │   ├── generic_importer.py
    │   │   ├── generic_importer_workflow.py
    │   │   └── photo_importer_base.py
    │   ├── ingestion_startup.sh
    │   ├── offline_processing.py
    │   └── workflow.py
    ├── init.py
    ├── init.sh
    ├── qa/
    │   ├── Dockerfile
    │   ├── README.md
    │   ├── chatgpt_engine.py
    │   ├── posttext/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── config.ini
    │   │   ├── data/
    │   │   │   └── TimelineQA/
    │   │   │       ├── dense-100/
    │   │   │       │   ├── annual_medical_care-log.csv
    │   │   │       │   ├── config.ini
    │   │   │       │   ├── create_db.sql
    │   │   │       │   ├── daily_chat-log.csv
    │   │   │       │   ├── daily_exercise-log.csv
    │   │   │       │   ├── daily_meal-log.csv
    │   │   │       │   ├── daily_read-log.csv
    │   │   │       │   ├── daily_watchtv-log.csv
    │   │   │       │   ├── marriages-log.csv
    │   │   │       │   ├── monthly_pet_care-log.csv
    │   │   │       │   ├── moves-log.csv
    │   │   │       │   ├── persona.json
    │   │   │       │   ├── timeline-dense.csv
    │   │   │       │   ├── timeline.json
    │   │   │       │   ├── travel-log.csv
    │   │   │       │   ├── travel_dining-log.csv
    │   │   │       │   ├── travel_places_visited-log.csv
    │   │   │       │   ├── views_idx.csv
    │   │   │       │   ├── views_metadata.txt
    │   │   │       │   ├── weekly_bakeorcook-log.csv
    │   │   │       │   ├── weekly_dating-log.csv
    │   │   │       │   ├── weekly_grocery-log.csv
    │   │   │       │   └── weekly_hobby-log.csv
    │   │   │       ├── medium-100/
    │   │   │       │   ├── annual_medical_care-log.csv
    │   │   │       │   ├── config.ini
    │   │   │       │   ├── create_db.sql
    │   │   │       │   ├── daily_chat-log.csv
    │   │   │       │   ├── daily_exercise-log.csv
    │   │   │       │   ├── daily_meal-log.csv
    │   │   │       │   ├── daily_read-log.csv
    │   │   │       │   ├── daily_watchtv-log.csv
    │   │   │       │   ├── marriages-log.csv
    │   │   │       │   ├── monthly_pet_care-log.csv
    │   │   │       │   ├── moves-log.csv
    │   │   │       │   ├── persona.json
    │   │   │       │   ├── timeline-medium.csv
    │   │   │       │   ├── timeline.json
    │   │   │       │   ├── travel-log.csv
    │   │   │       │   ├── travel_dining-log.csv
    │   │   │       │   ├── travel_places_visited-log.csv
    │   │   │       │   ├── views_idx.csv
    │   │   │       │   ├── views_metadata.txt
    │   │   │       │   ├── weekly_bakeorcook-log.csv
    │   │   │       │   ├── weekly_dating-log.csv
    │   │   │       │   ├── weekly_grocery-log.csv
    │   │   │       │   └── weekly_hobby-log.csv
    │   │   │       └── sparse-100/
    │   │   │           ├── annual_medical_care-log.csv
    │   │   │           ├── config.ini
    │   │   │           ├── create_db.sql
    │   │   │           ├── daily_chat-log.csv
    │   │   │           ├── daily_exercise-log.csv
    │   │   │           ├── daily_meal-log.csv
    │   │   │           ├── daily_read-log.csv
    │   │   │           ├── daily_watchtv-log.csv
    │   │   │           ├── marriages-log.csv
    │   │   │           ├── monthly_pet_care-log.csv
    │   │   │           ├── moves-log.csv
    │   │   │           ├── persona.json
    │   │   │           ├── results/
    │   │   │           │   ├── q1-result.csv
    │   │   │           │   ├── q10-result.csv
    │   │   │           │   ├── q11-result.csv
    │   │   │           │   ├── q12-result.csv
    │   │   │           │   ├── q13-result.csv
    │   │   │           │   ├── q14-result.csv
    │   │   │           │   ├── q15-result.csv
    │   │   │           │   ├── q16-result.csv
    │   │   │           │   ├── q17-result.csv
    │   │   │           │   ├── q18-result.csv
    │   │   │           │   ├── q19-result.csv
    │   │   │           │   ├── q2-result.csv
    │   │   │           │   ├── q20-result.csv
    │   │   │           │   ├── q21-result.csv
    │   │   │           │   ├── q22-result.csv
    │   │   │           │   ├── q23-result.csv
    │   │   │           │   ├── q24-result.csv
    │   │   │           │   ├── q25-result.csv
    │   │   │           │   ├── q26-result.csv
    │   │   │           │   ├── q27-result.csv
    │   │   │           │   ├── q28-result.csv
    │   │   │           │   ├── q29-result.csv
    │   │   │           │   ├── q3-result.csv
    │   │   │           │   ├── q30-result.csv
    │   │   │           │   ├── q31-result.csv
    │   │   │           │   ├── q32-result.csv
    │   │   │           │   ├── q33-result.csv
    │   │   │           │   ├── q34-result.csv
    │   │   │           │   ├── q35-result.csv
    │   │   │           │   ├── q36-result.csv
    │   │   │           │   ├── q37-result.csv
    │   │   │           │   ├── q38-result.csv
    │   │   │           │   ├── q39-result.csv
    │   │   │           │   ├── q4-result.csv
    │   │   │           │   ├── q40-result.csv
    │   │   │           │   ├── q41-result.csv
    │   │   │           │   ├── q42-result.csv
    │   │   │           │   ├── q5-result.csv
    │   │   │           │   ├── q6-result.csv
    │   │   │           │   ├── q7-result.csv
    │   │   │           │   ├── q8-result.csv
    │   │   │           │   ├── q9-result.csv
    │   │   │           │   └── queries.csv
    │   │   │           ├── timeline.csv.txt
    │   │   │           ├── timeline.json
    │   │   │           ├── travel-log.csv
    │   │   │           ├── travel_dining-log.csv
    │   │   │           ├── travel_places_visited-log.csv
    │   │   │           ├── views_idx.csv
    │   │   │           ├── views_metadata.txt
    │   │   │           ├── weekly_bakeorcook-log.csv
    │   │   │           ├── weekly_dating-log.csv
    │   │   │           ├── weekly_grocery-log.csv
    │   │   │           └── weekly_hobby-log.csv
    │   │   ├── newqueryfile.csv
    │   │   ├── requirements.txt
    │   │   ├── server.py
    │   │   ├── src/
    │   │   │   ├── posttext.py
    │   │   │   ├── retrieval_qa.py
    │   │   │   ├── views_qa.py
    │   │   │   └── views_util.py
    │   │   └── util/
    │   │       ├── create_metadata_idx.py
    │   │       ├── data2vectorstore.py
    │   │       ├── digital_data2vectorstore.py
    │   │       ├── jsontimeline2csv.py
    │   │       ├── setup.py
    │   │       └── table2text.py
    │   ├── qa_engine.py
    │   ├── server.py
    │   └── view_engine.py
    └── requirements.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.idea/*
*__pycache__*
*.pkl
*.pt
*.db
*.pyc
static/*
personal-data
env/


================================================
FILE: .gitmodules
================================================
[submodule "BLIP"]
	path = BLIP
	url = https://github.com/salesforce/BLIP.git


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to make participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
  advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
  address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
  professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.

This Code of Conduct also applies outside the project spaces when there is a
reasonable belief that an individual's behavior may have a negative impact on
the project or its community.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at <opensource-conduct@fb.com>. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to personal-timeline
We want to make contributing to this project as easy and transparent as
possible.

## Pull Requests
We actively welcome your pull requests.

1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. If you haven't already, complete the Contributor License Agreement ("CLA").

## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Meta's open source projects.

Complete your CLA here: <https://code.facebook.com/cla>

## Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.

Meta has a [bounty program](https://www.facebook.com/whitehat/) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.

## License
By contributing to TimelineBuilder, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.


================================================
FILE: DATASET.md
================================================
# Personal timeline dataset

This dataset is a sample of ~2 months of one of our own member’s personal digital services data self-collected and anonymized. The digital services include:

* Books: Amazon Kindle and Libby, 93 records in total
* Purchase: Amazon, 95 records in total
* Streaming: Spotify, 111 records in total
* Exercise: Apple Watch, 33 records in total
* Photos: Google Photos, 325 records in total
* Places: Geo locations (lat/long and addresses) from the Google photos metadata, 467 records in total

All raw data were downloaded from the service providers following our data importer instructions. 

## How we anonymize the data

* *Books, purchase, streaming*: we reviewed each individual records
* *Places (all location, address data)*: We anonymize near-home location data by a distance-preserving random project to a high-dimensional space then project back the points to 2D. We use reverse geo-coding to label the addresses of those points. For locations that are not near-home, we verified that all addresses are public space.
* *Images*: We anonymize photos by replacing them with images generated using an AI-generation tool, DALL-E. We use object and place detection to generate captions of the raw images, and use the caption as the image generation prompt (e.g., “A realistic photo for egg tart in the kitchen”). We manually removed all images with people from the output.

## How this dataset can be used

We intend to use this dataset to demostrate question-answer systems over digital service data. With the underlying data, the QA system, such as a personalized version of ChatGPT, should be able to answer questions such as:
* “When was the last time I visited Japan”,
* “Show me some photos of plants in my neighborhood”,
* “How many times I exercise during the month X”, etc.

## Example records

Books:

| time               | book_name           | img_url                                                                                       | id       |
|--------------------|---------------------|-----------------------------------------------------------------------------------------------|----------|
| 2022-12-19 4:37:00 | I Am a Strange Loop | https://img1.od-cdn.com/ImageType-100/0887-1/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg | books_26 |

Exercise: 

| start_time                | end_time                  | textDescription           | duration    | distance | calories | outdoor | temperature | id          |
|---------------------------|---------------------------|---------------------------|-------------|----------|----------|---------|-------------|-------------|
| 2019-03-02 08:00:34-08:00 | 2019-03-02 08:39:59 -0800 | 08:00: running 39 minutes | 39.40743217 | 0        | 0        | 1       |             | exercise_35 |

Photos:

| start_time                | end_time                  | textDescription   | address                                                                   | lat     | long     | details                                                                                                                                                                                                                                    | img_url                                                                                              |
|---------------------------|---------------------------|-------------------|---------------------------------------------------------------------------|---------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| 2019-03-28 20:38:13+09:00 | 2019-03-28 20:38:13+09:00 | from Google Photo | Asahi Shokudo, 龍土町美術館通り, Roppongi, Minato, Tokyo, 106-0033, Japan | 35.6644 | 139.7301 | {'objects': ['Tsukudani', 'Scroll', 'Document', 'Receipt', 'Homework', 'paper', 'menu', 'date', 'sheet, flat solid', 'shoji'], 'places': ['restaurant', 'hotel room', 'archive', 'sushi bar', 'restaurant kitchen'], 'tags': ['document']} | digital_data/images/google_photos/part 2/Google Photos/Photos from 2019/IMG_6955.HEIC.compressed.jpg |

Purchase:

| time                | purchase_id         | productName                                             | productPrice | productQuantity | id         |
|---------------------|---------------------|---------------------------------------------------------|--------------|-----------------|------------|
| 2022-07-26 16:29:16 | 114-9774413-4401831 | Dr. Earth 713 Organic 9 Fruit Tree Fertilizer, 12-Pound | 22.53        | 1               | purchase_0 |
| 2022-07-26 16:28:27 | 114-9230659-7782623 | Miracle-Gro Citrus, Avocado, & Mango Food, 20 lb.       | 15.4         | 1               | purchase_1 |

Streaming:

| start_time          | end_time            | artist              | track                                                              | playtimeMs | spotify_link | id          |
|---------------------|---------------------|---------------------|--------------------------------------------------------------------|------------|--------------|-------------|
| 2022-05-31 11:35:00 | 2022-05-31 11:35:00 | Lex Fridman Podcast | #282 – David Buss: Sex, Dating, Relationships, and Sex Differences | 18000      |              | streaming_0 |

Places:

| start_time | end_time | TextDescription | start_address | start_lat | start_long | end_address | end_lat | end_long | id |
|------------|----------|-----------------|---------------|-----------|------------|-------------|---------|----------|----|
| 2019-04-20 13:02:58-07:00 | 2019-04-20 13:02:58-07:00 | Texas Home on Greg Street | 2966, Greg Street, Randall County, Texas, 79015, United States | 35.03744471122455 | -101.90857274320028 | 2966, Greg Street, Randall County, Texas, 79015, United States | 35.03744471122455 | -101.90857274320028 | places_27422 |


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: NEW_DATASOURCE.md
================================================
# Adding a New Data Source
There are two ways to add a new data source depending on the complexity of input.
If you have a simple non-nested CSV or JSON file, it's quite straightforward and can be done by updating some configurations.
For more complex cases, writing a custom importer is required.
Information on both scenarios is detailed below.

## Adding Data Source Configs
[data_source.json](src/common/bootstrap/data_source.json) keeps track of all sources and should be updated 
irrespective of the path you choose to add the new source.
This file contains a list of all data sources that are processed by the importer. 
In code, these configs are deserialized to class structure defined in [import_configs.py](src/common/objects/import_configs.py).
- Refer to [Class Hierarchy](src/common/objects/import_configs.py) to properly add an entry to 
[data_source.json](src/common/bootstrap/data_source.json). Give a unique source ID to your data source.


## Simple Non-nested CSV and JSONs
If you have a simple non-nested CSV or JSON file, you can simply use the generic CSV/JSON importer provided once the new data source is configured properly.
Steps to follow:
- Follow steps in "Adding Data Source Configs" section to update [data_source.json](src/common/bootstrap/data_source.json) file.
- Rerun [init.py](src/init.py) to create source directory for new sources.
- Add data files to source directory.
- Make sure `ingest_new_data` is set to True in [ingest.conf](conf/ingest.conf)
- Re-run the backend docker image to ingest data from new sources.

## More complex Inputs
Write a custom importer following these guidelines 
- Follow steps in "Adding Data Source Configs" section to update [data_source.json](src/common/bootstrap/data_source.json) file.
- Extend your custom importer from [GenericImporter](src/ingest/importers/generic_importer.py) class; 
    implement the function `import_data` and use the configurations as appropriate to create LLEntry objects(Refer to available custom importers)
- Store class file under [importers](src/ingest/importers)
- Update `generic_importer_workflow.py`; add call to your class in the `if...else` loop of the `start_import` function. Alternatively, contribute to fix this class using configs.

When writing new importer for Photos, you may find it useful to extend `PhotoImporter` instead of `GenericImporter`. 
We started with simple Photo imports so `PhotoImporter` was written to reduce code duplication.
Creating `GenericImporter` was an afterthought and there is room for improving this model of extension to refactor, or even combine
these two importer classes in the future. 


================================================
FILE: README.md
================================================
<!-- This file explains how to create LifeLog entries from several data sources. -->

# TimelineBuilder

## Table of Content

- [Setup](#general-setup): how to set up for this repo
- [Importers](#digital-data-importers): how to create LifeLog entries from several data sources.
  - [Downloading Digital Data](#downloading-your-personal-data)
  - [Running the importers](#running-the-code)
- [Sample Dataset](DATASET.md): a sampled set of anonymized data for testing
- [Data Visualization](#visualization-of-the-personal-timeline): a ReactJS-based visualization frontend of the personal timeline
- [Question Answering](#question-answer-over-the-personal-timeline): a LLM-based QA engine over the personal timeline
- [TimelineQA](#timelineqa-a-benchmark-for-question-answer-over-the-personal-timeline): a synthetic benchmark for evaluating personal timeline QA systems

## General Setup

## Step 0: Create environment

1. Install Docker Desktop from [this link](https://docs.docker.com/desktop/).

2. Follow install steps and use the Desktop app to start the docker engine.

3. Install `git-lfs` and clone the repo. You may need a conda env to do that:
```
conda create -n personal-timeline python=3.10
conda activate personal-timeline

conda install -c conda-forge git-lfs
git lfs install

git clone https://github.com/facebookresearch/personal-timeline
cd personal-timeline
```

4. Run init script (needs python)
```
sh src/init.sh
```
This will create a bunch of files/folders/symlinks needed for running the app.
This will also create a new directory under your home folder `~/personal-data`, the directory where your personal data will reside.

## Step 1: Setting up


## For Data Ingestion

Ingestion configs are controlled via parameters in `conf/ingest.conf` file. The configurations
are defaulted for optimized processing and don't need to be changed. 
You can adjust values for these parameters to run importer with a different configuration.



## For Data visualization

1. To set up a Google Map API (free), follow these [instructions](https://developers.google.com/maps/documentation/embed/quickstart#create-project).

Copy the following lines to `env/frontend.env.list`:
```
GOOGLE_MAP_API=<the API key goes here>
```

2. To embed Spotify, you need to set up a Spotify API (free) following [here](https://developer.spotify.com/dashboard/applications). You need to log in with a Spotify account, create a project, and show the `secret`.

Copy the following lines to `env/frontend.env.list`:
```
SPOTIFY_TOKEN=<the token goes here>
SPOTIFY_SECRET=<the secret goes here>
```

## For Question-Answering

Set up an OpenAI API following these [instructions](https://openai.com/api/).

Copy the following line to `env/frontend.env.list`:
```
OPENAI_API_KEY=<the API key goes here>
```

## Digital Data Importers


## Downloading your personal data

We currently support 9 data sources. Here is a summary table:

| Digital Services | Instructions                                                                        | Destinations                                                             | Use cases                                              |
|------------------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------|--------------------------------------------------------|
| Apple Health     | [Link](https://github.com/facebookresearch/personal-timeline#apple-health)  | personal-data/apple-health                                               | Exercise patterns, calorie counts                      |
| Amazon           | [Link](https://github.com/facebookresearch/personal-timeline#amazon)        | personal-data/amazon                                                     | Product recommendation, purchase history summarization |
| Amazon Kindle    | [Link](https://github.com/facebookresearch/personal-timeline#amazon)        | personal-data/amazon-kindle                                              | Book recommendation                                    |
| Spotify          | [Link](https://github.com/facebookresearch/personal-timeline#spotify)       | personal-data/spotify                                                    | Music / streaming recommendation                       |
| Venmo            | [Link](https://github.com/facebookresearch/personal-timeline#venmo)         | personal-data/venmo                                                      | Monthly spend summarization                            |
| Libby            | [Link](https://github.com/facebookresearch/personal-timeline#libby)         | personal-data/libby                                                      | Book recommendation                                    |
| Google Photos    | [Link](https://github.com/facebookresearch/personal-timeline#google-photos) | personal-data/google_photos                                              | Food recommendation, Object detections, and more               |
| Google Location  | [Link](https://github.com/facebookresearch/personal-timeline#google-photos) | personal-data/google-timeline/Location History/Semantic Location History | Location tracking / visualization                      |
| Facebook posts   | [Link](https://github.com/facebookresearch/personal-timeline#facebook-data) | personal-data/facebook                                                   | Question-Answering over FB posts / photos              |

If you have a different data source not listed above, follow the instructions [here](NEW_DATASOURCE.md)
to add this data source to the importer.

### GOOGLE PHOTOS and GOOGLE TIMELINE
<!--1. You need to download your Google photos from [Google Takeout](https://takeout.google.com/).  
The download from Google Takeout would be in multiple zip files. Unzip all the files.

2. It may be the case that some of your photo files are .HEIC. In that case follow the steps below to convert them to .jpeg  
The easiest way to do this on a Mac is:

     -- Select the .HEIC files you want to convert.   
     -- Right click and choose "quick actions" and then you'll have an option to convert the image.  
     -- If you're converting many photos, this may take a few minutes. 

2. Move all the unzipped folders inside `~/personal-data/google_photos/`. There can be any number of sub-folders under `google_photos`.-->

1. You can download your Google photos and location (also Gmail, map and google calendar) data from [Google Takeout](https://takeout.google.com/).
2. The download from Google Takeout would be in multiple zip files. Unzip all the files.
3. For Google photos, move all the unzipped folders inside `~/personal-data/google_photos/`. There can be any number of sub-folders under `google_photos`.
4. For Google locations, move the unzipped files to `personal-data/google-timeline/Location History/Semantic Location History`.

### FACEBOOK DATA
1. Go to [Facebook Settings](https://www.facebook.com/settings?tab=your_facebook_information) 
2. Click on <b>Download your information</b> and download FB data in JSON format
3. Unzip the downloaded file and copy the directory `posts` sub-folder to `~/personal-data/facebook`. The `posts` folder would sit directly under the Facebook folder.

### APPLE HEALTH
1. Go to the Apple Health app on your phone and ask to export your data. This will create a file called iwatch.xml and that's the input file to the importer.
2. Move the downloaded file to this `~/personal-data/apple-health`

### AMAZON
1. Request your data from Amazon here: https://www.amazon.com/gp/help/customer/display.html?nodeId=GXPU3YPMBZQRWZK2
They say it can take up to 30 days, but it took about 2 days. They'll email you when it's ready.

They separate Amazon purchases from Kindle purchases into two different directories.

The file you need for Amazon purchases is Retail.OrderHistory.1.csv
The file you need for Kindle purchases is Digital Items.csv

2. Move data for Amazon purchases to `~/personal-data/amazon` folder and of kindle downloads to `~/personal-data/amazon-kindle` folder

### VENMO
1. Download your data from Venmo here -- https://help.venmo.com/hc/en-us/articles/360016096974-Transaction-History

2. Move the data into `~/personal-data/venmo` folder.

### LIBBY
1. Download your data from Libby here -- https://libbyapp.com/timeline/activities. Click on `Actions` then `Export Timeline`

2. Move the data into `~/personal-data/libby` folder.


### SPOTIFY

1. Download your data from Spotify here -- https://support.spotify.com/us/article/data-rights-and-privacy-settings/
They say it can take up to 30 days, but it took about 2 days. They'll email you when it's ready.

2. Move the data into `~/personal-data/spotify` folder.

# Running the code
Now that we have all the data and setting in place, we can either run individual steps or the end-to-end system.
This will import your photo data to SQLite (this is what will go into the episodic database), build summaries
and make data available for visualization and search.


Running the Ingestion container will add two types of file to `~/personal-data/app_data` folder
 - Import your data to an SQLite DB named `raw_data.db`
 - Export your personal data into csv files such as `books.csv`, `exercise.csv`, etc.

### Option 1:
To run the pipeline end-to-end (with frontend and QA backend), simply run 
```
docker-compose up -d --build
```

### Option 2:
You can also run ingestion, visualization, and the QA engine separately.
To start data ingestion, use  
```
docker-compose up -d backend --build
```

## Check progress
Once the docker command is run, you can see running containers for backend and frontend in the docker for Mac UI.
Copy the container Id for ingest and see logs by running the following command:  
```
docker logs -f <container_id>
```

<!-- # Step 5: Visualization and Question Answering -->

## Visualization of the personal timeline

To start the visualization frontend:
```
docker-compose up -d frontend --build
```

Running the Frontend will start a ReactJS UI at `http://localhost:3000`. See [here](src/frontend/) for more details.

We provide an anonymized digital data [dataset](sample_data/) for testing the UI and QA system, see [here](DATASET.md) for more details.

![Timeline Visualization](ui.png)


## Question Answer over the personal timeline

The QA engine is based on PostText, a QA system for answering queries that require computing aggregates over personal data.

PostText Reference ---  [https://arxiv.org/abs/2306.01061](https://arxiv.org/abs/2306.01061):
```
@article{tan2023posttext,
      title={Reimagining Retrieval Augmented Language Models for Answering Queries},
      author={Wang-Chiew Tan and Yuliang Li and Pedro Rodriguez and Richard James and Xi Victoria Lin and Alon Halevy and Scott Yih},
      journal={arXiv preprint:2306.01061},
      year={2023},
}
```

To start the QA engine, run:
```
docker-compose up -d qa --build
```
The QA engine will be running on a flask server inside a docker container at `http://localhost:8085`. 

See [here](src/qa) for more details.

![QA Engine](qa.png)

There are 3 options for the QA engine.
* *ChatGPT*: uses OpenAI's gpt-3.5-turbo [API](https://platform.openai.com/docs/models/overview) without the personal timeline as context. It answers world knowledge question such as `what is the GDP of US in 2021` but not personal questions.
* *Retrieval-based*: answers question by retrieving the top-k most relevant episodes from the personal timeline as the LLM's context. It can answer questions over the personal timeline such as `show me some plants in my neighborhood`.
* *View-based*: translates the input question to a (customized) SQL query over tabular views (e.g., books, exercise, etc.) of the personal timeline. This QA engine is good at answering aggregate queries (`how many books did I purchase?`) and min/max queries (`when was the last time I travel to Japan`).


Example questions you may try:
* `Show me some photos of plants in my neighborhood`
* `Which cities did I visit when I traveled to Japan?`
* `How many books did I purchase in April?`

## TimelineQA: a benchmark for Question Answer over the personal timeline

TimelineQA is a synthetic benchmark for accelerating progress on querying personal timelines. 
TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high
school graduation to those that occur on a daily basis such as going for a run. We have evaluated SOTA models for atomic and multi-hop QA on the benchmark. 

Please check out the TimelineQA github [repo](https://github.com/facebookresearch/TimelineQA) and the TimelineQA paper ---  [https://arxiv.org/abs/2306.01061](https://arxiv.org/abs/2306.01061):
```
@article{tan2023timelineqa,
  title={TimelineQA: A Benchmark for Question Answering over Timelines},
  author={Tan, Wang-Chiew and Dwivedi-Yu, Jane and Li, Yuliang and Mathias, Lambert and Saeidi, Marzieh and Yan, Jing Nathan and Halevy, Alon Y},
  journal={arXiv preprint arXiv:2306.01069},
  year={2023}
}
```

## License

The codebase is licensed under the [Apache 2.0 license](LICENSE).

## Contributing

See [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).

## Contributor Attribution

We'd like to thank the following contributors for their contributions to this project:
- [Tripti Singh](https://github.com/tripti-singh)
  - Design and implementation of the sqlite DB backend
  - Designing a pluggable data import and enrichment layer and building the pipeline orchestrator.
  - Importers for all six [data sources](https://github.com/facebookresearch/personal-timeline#digital-data-importers)
  - Generic csv and json data sources importer with [instructions](https://github.com/facebookresearch/personal-timeline/blob/main/NEW_DATASOURCE.md)
  - Dockerization
  - Contributing in Documentation
- [Wang-Chiew Tan](https://github.com/wangchiew)
  - Implementation of the [PostText](https://arxiv.org/abs/2306.01061) query engine
- [Pierre Moulon](https://github.com/SeaOtocinclus) for providing open-sourcing guidelines and suggestions


================================================
FILE: conf/ingest.conf
================================================
# incremental_* if True, will only process data that was not processed previously.
# When false, it will re-process all data. Useful when there is new logic that requires re-compute of everything from beginning.
ingest_new_data=True
incremental_geo_enrich=True
incremental_image_enrich=True
incremental_export=False
# export_enriched_data_to_json=True
enriched_data_to_json=True

================================================
FILE: docker-compose.yml
================================================
version: "3.9"

services:
  frontend:
    build:
      context: .
      dockerfile: src/frontend/Dockerfile
    ports:
      - "3000:3000"
    volumes:
      - ./personal-data/:/app/personal-data/
    environment:
      - APP_DATA_DIR=/app/personal-data/app_data
    env_file:
      - env/frontend.env.list

  qa:
    build:
      context: .
      dockerfile: src/qa/Dockerfile
    ports:
      - "8085:8085"
    volumes:
      - ./personal-data/:/app/personal-data/
    environment:
      - APP_DATA_DIR=/app/personal-data/app_data
    env_file:
      - env/frontend.env.list

  backend:
    build:
      context: .
      dockerfile: src/ingest/Dockerfile
    volumes:
      - ./personal-data/:/app/personal-data/
    environment:
      - APP_DATA_DIR=/app/personal-data/app_data
    env_file:
      - env/frontend.env.list
      - conf/ingest.conf


================================================
FILE: notebooks/extract_narration_tutorial.ipynb
================================================
{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "fb140897",
      "metadata": {
        "id": "fb140897"
      },
      "source": [
        "# Extract Narrations and Audio Captions Tutorial\n",
        "\n",
        "In this tutorial, you will be extracting video narrations through an auto-narration model, LaViLa, as well as audio captions through speech-to-text model, WhisperX. Finally, you will be able to interact with the extracted narrations and captions using langchain.\n",
        "\n",
        "### Notebook stuck?\n",
        "Note that because of Jupyter issues, sometimes the code may stuck at visualization. We recommend **restart the kernels** and try again to see if the issue is resolved."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "l5LlycOs75bf",
      "metadata": {
        "id": "l5LlycOs75bf"
      },
      "source": [
        "## Step 1. Install Project Aria Tools\n",
        "Run the following cell to install Project Aria Tools for reading Aria recordings in .vrs format"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "Yt5CoQlp8Cxw",
      "metadata": {
        "id": "Yt5CoQlp8Cxw"
      },
      "outputs": [],
      "source": [
        "# Specifics for Google Colab\n",
        "google_colab_env = 'google.colab' in str(get_ipython())\n",
        "print(\"Running from Google Colab, installing projectaria_tools\")\n",
        "!pip install projectaria-tools"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "Tv0I3ajm7TyH",
      "metadata": {
        "id": "Tv0I3ajm7TyH"
      },
      "source": [
        "## Step 2. Prepare an Aria recording\n",
        "\n",
        "\n",
        "### Prepare your collected Aria recording\n",
        "We will set the vrsfile path to your collected Aria recording.\n",
        "\n",
        "Upload your Aria recording in your Google Drive before running the cell.\n",
        "\n",
        "Here, we assume it is uploaded to **`My Drive/aria/recording.vrs`**\n",
        "\n",
        "*(You can check the content of the mounted drive by running `!ls \"/content/drive/My Drive/\"` in a cell.)*\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "WKyjF8fF7tA_",
      "metadata": {
        "id": "WKyjF8fF7tA_"
      },
      "outputs": [],
      "source": [
        "from google.colab import drive\n",
        "import os\n",
        "drive.flush_and_unmount()\n",
        "drive.mount('/content/drive/')\n",
        "my_vrs_file_path = 'aria/recording.vrs'\n",
        "vrsfile = \"/content/drive/My Drive/\" + my_vrs_file_path\n",
        "print(f\"INFO: vrsfile set to {vrsfile}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "vZLboooc7QLQ",
      "metadata": {
        "id": "vZLboooc7QLQ"
      },
      "source": [
        "## Tip: Avoid re-installation of packages\n",
        "Follow the below steps to avoid re-installation of package due to Colab shutting off or restarts later when running the scripts.\n",
        "\n",
        "(1) Create a folder called “ColabNotebooks” manually in your Drive (under \"My Drive\").\n",
        "\n",
        "(2) Then, run the below cell to add the symlink path to the system path."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "vnjjSGXlc4-i",
      "metadata": {
        "id": "vnjjSGXlc4-i"
      },
      "outputs": [],
      "source": [
        "# This step is to save package\n",
        "nb_path = '/content/notebooks'\n",
        "os.symlink('/content/drive/My Drive/ColabNotebooks', nb_path)\n",
        "sys.path.insert(0,nb_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "dHN4k57-nGBC",
      "metadata": {
        "id": "dHN4k57-nGBC"
      },
      "source": [
        "\n",
        "#### (Optional) Download a sample data for debugging\n",
        "Use this small scale sample data for testing out the dependencies."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2aJEwVNJnDsL",
      "metadata": {
        "id": "2aJEwVNJnDsL"
      },
      "outputs": [],
      "source": [
        "# !curl -O -J -L \"https://github.com/facebookresearch/projectaria_tools/raw/main/data/mps_sample/sample.vrs\"\n",
        "# vrsfile = \"sample.vrs\"\n",
        "# print(f\"INFO: vrsfile set to {vrsfile}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8196ad05",
      "metadata": {
        "id": "8196ad05"
      },
      "source": [
        "## Step 3. Create data provider\n",
        "\n",
        "Create projectaria data_provider so you can load the content of the vrs file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "fb04b53b",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fb04b53b",
        "outputId": "72defac0-8eae-41be-afc4-44f132e39e88"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Creating data provider from /content/drive/My Drive/aria/recording_long.vrs\n"
          ]
        }
      ],
      "source": [
        "from projectaria_tools.core import data_provider, calibration\n",
        "from projectaria_tools.core.sensor_data import TimeDomain, TimeQueryOptions\n",
        "from projectaria_tools.core.stream_id import RecordableTypeId, StreamId\n",
        "import numpy as np\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "print(f\"Creating data provider from {vrsfile}\")\n",
        "provider = data_provider.create_vrs_data_provider(vrsfile)\n",
        "if not provider:\n",
        "    print(\"Invalid vrs data provider\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c5033225",
      "metadata": {
        "id": "c5033225"
      },
      "source": [
        "## Step 4. Display VRS rgb content in thumbnail images\n",
        "\n",
        "Goals:\n",
        "- Summarize a VRS using 10 image side by side, to visually inspect the collected data.\n",
        "\n",
        "Key learnings:\n",
        "- Image streams are identified with a Unique Identifier: stream_id\n",
        "- Image frames are identified with timestamps\n",
        "- PIL images can be created from Numpy array\n",
        "\n",
        "Customization\n",
        "- To change the number of sampled images, change the variable `sample_count` to a desired number.\n",
        "- To change the thumbnail size, change the variable `resize_ratio` to a desired value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "933725b6",
      "metadata": {
        "id": "933725b6"
      },
      "outputs": [],
      "source": [
        "from PIL import Image\n",
        "from tqdm import tqdm\n",
        "\n",
        "sample_count = 10\n",
        "resize_ratio = 10\n",
        "\n",
        "rgb_stream_id = StreamId(\"214-1\")\n",
        "\n",
        "# Retrieve image size for the RGB stream\n",
        "time_domain = TimeDomain.DEVICE_TIME  # query data based on host time\n",
        "option = TimeQueryOptions.CLOSEST # get data whose time [in TimeDomain] is CLOSEST to query time\n",
        "\n",
        "# Retrieve Start and End time for the given Sensor Stream Id\n",
        "start_time = provider.get_first_time_ns(rgb_stream_id, time_domain)\n",
        "end_time = provider.get_last_time_ns(rgb_stream_id, time_domain)\n",
        "\n",
        "image_config = provider.get_image_configuration(rgb_stream_id)\n",
        "width = image_config.image_width\n",
        "height = image_config.image_height\n",
        "\n",
        "thumbnail = newImage = Image.new(\n",
        "    \"RGB\", (int(width * sample_count / resize_ratio), int(height / resize_ratio))\n",
        ")\n",
        "current_width = 0\n",
        "\n",
        "\n",
        "# Samples 10 timestamps\n",
        "sample_timestamps = np.linspace(start_time, end_time, sample_count)\n",
        "for sample in tqdm(sample_timestamps):\n",
        "    image_tuple = provider.get_image_data_by_time_ns(rgb_stream_id, int(sample), time_domain, option)\n",
        "    image_array = image_tuple[0].to_numpy_array()\n",
        "    image = Image.fromarray(image_array)\n",
        "    new_size = (\n",
        "        int(image.size[0] / resize_ratio),\n",
        "        int(image.size[1] / resize_ratio),\n",
        "    )\n",
        "    image = image.resize(new_size).rotate(-90)\n",
        "    thumbnail.paste(image, (current_width, 0))\n",
        "    current_width = int(current_width + width / resize_ratio)\n",
        "\n",
        "from IPython.display import Image\n",
        "display(thumbnail)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "nFkss4vlaWxh",
      "metadata": {
        "id": "nFkss4vlaWxh"
      },
      "source": [
        "## Step 5. Prepare Pytorch Data Loader for Auto-Narration\n",
        "\n",
        "Here, we will be creating a pytorch data loader that outputs batches of video snippets in order to run the LaViLa auto-narration model.\n",
        "\n",
        "A **snippet** consists of a series of frames captured over a brief time span, which we will refer to as **snippet duration**.\n",
        "\n",
        "#### Step 5-1. Define Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "Kee0HuCgKUy_",
      "metadata": {
        "id": "Kee0HuCgKUy_"
      },
      "outputs": [],
      "source": [
        "import torch\n",
        "from torch.utils.data import Dataset, DataLoader\n",
        "from PIL import Image\n",
        "import torchvision.transforms as transforms\n",
        "import torchvision.transforms._transforms_video as transforms_video\n",
        "import torch.nn as nn\n",
        "\n",
        "class RGBSnippetDataset(Dataset):\n",
        "    def __init__(self,\n",
        "                 start_time: float, # start time in the video for sampling data\n",
        "                 end_time: float, # end time in the video for sampling data\n",
        "                 snippet_dur_sec: int, # snippet duration\n",
        "                 frames_per_snippet: int, # number of frames per snippet\n",
        "                 transform=None\n",
        "    ):\n",
        "        self.start_time = start_time\n",
        "        self.end_time = end_time\n",
        "        self.snippet_dur = snippet_dur_sec * 1000000000 # duration of a snippet in nano seconds\n",
        "        self.frames_per_snippet = frames_per_snippet # number of frames per snippet\n",
        "        self.stride_ns = int(self.snippet_dur//frames_per_snippet)\n",
        "        self.num_snippets = int((end_time - start_time) // self.snippet_dur)\n",
        "        self.snippet_starts = np.arange(start_time, start_time + self.snippet_dur * self.num_snippets, self.snippet_dur)\n",
        "\n",
        "        # Precompute timestamps for each snippet\n",
        "        self.all_frame_timestamps = [np.arange(snippet_start, snippet_start + self.snippet_dur, self.stride_ns) for snippet_start in self.snippet_starts]\n",
        "\n",
        "        self.rgb_stream_id = rgb_stream_id\n",
        "        self.time_domain = time_domain\n",
        "        self.option = option\n",
        "        self.transform = transform\n",
        "\n",
        "    def __len__(self):\n",
        "        return self.num_snippets\n",
        "\n",
        "    def __getitem__(self, idx):\n",
        "        # returns a snippet\n",
        "\n",
        "        # get timestamps of frames that belong to the current snippet idx\n",
        "        frame_timestamps = self.all_frame_timestamps[idx]\n",
        "\n",
        "        # read frames from the data provider and append to frame_list\n",
        "        frame_list = []\n",
        "        for timestamp in frame_timestamps:\n",
        "            image_tuple = provider.get_image_data_by_time_ns(self.rgb_stream_id, int(timestamp), self.time_domain, self.option)\n",
        "            image_array = image_tuple[0].to_numpy_array()\n",
        "            frame_list.append(image_array)\n",
        "\n",
        "        # append a set of images to a snippet\n",
        "        frames = [torch.tensor(frame, dtype=torch.float32) for frame in frame_list]\n",
        "        frames = torch.stack(frames, dim=0)\n",
        "\n",
        "        if self.transform:\n",
        "          frames = self.transform(frames)\n",
        "\n",
        "        # return snippet start time and end time\n",
        "        snippet_start = self.snippet_starts[idx]\n",
        "        snippet_end = snippet_start + self.snippet_dur\n",
        "\n",
        "        return frames, snippet_start, snippet_end\n",
        "\n",
        "class Permute(nn.Module):\n",
        "    \"\"\"\n",
        "    Permutation as an op\n",
        "    \"\"\"\n",
        "    def __init__(self, ordering):\n",
        "        super().__init__()\n",
        "        self.ordering = ordering\n",
        "\n",
        "    def forward(self, frames):\n",
        "        \"\"\"\n",
        "        Args:\n",
        "            frames in some ordering, by default (C, T, H, W)\n",
        "        Returns:\n",
        "            frames in the ordering that was specified\n",
        "        \"\"\"\n",
        "        return frames.permute(self.ordering)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "N7p7uCJZKPNm",
      "metadata": {
        "id": "N7p7uCJZKPNm"
      },
      "source": [
        "#### Step 5-2. Construct Data Loader\n",
        "Here you can set batch size (`batch_size`) as well as customize start time, end_time for running auto-narration."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "gWvmKM_nS0Ri",
      "metadata": {
        "id": "gWvmKM_nS0Ri"
      },
      "outputs": [],
      "source": [
        "# Retrieve Start and End time for the given Sensor Stream Id\n",
        "start_time = provider.get_first_time_ns(rgb_stream_id, time_domain)\n",
        "end_time = provider.get_last_time_ns(rgb_stream_id, time_domain)\n",
        "\n",
        "batch_size = 2 # batch size in dataloader (Decrease/increase based on the GPU memory)\n",
        "image_size = 224  # image size after resizing (Do not change for LaViLa)\n",
        "snippet_dur_sec = 2  # duration of a snippet (We recommend values between 1-10.)\n",
        "frames_per_snippet = 4  # number of frames per snippet (Do not change for LaViLa)\n",
        "\n",
        "val_transform = transforms.Compose([\n",
        "    Permute([3, 0, 1, 2]),  # T H W C -> C T H W\n",
        "    transforms.Resize(image_size),\n",
        "    transforms_video.NormalizeVideo(mean=[108.3272985, 116.7460125, 104.09373615000001], std=[68.5005327, 66.6321579, 70.32316305]),\n",
        "])\n",
        "rgb_snippet_dataset = RGBSnippetDataset(start_time, end_time, snippet_dur_sec=snippet_dur_sec, frames_per_snippet=frames_per_snippet, transform=val_transform)\n",
        "snippet_dataloader = DataLoader(rgb_snippet_dataset, batch_size=batch_size, shuffle=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ixrGS0J0afaE",
      "metadata": {
        "id": "ixrGS0J0afaE"
      },
      "source": [
        "## Step 6. Install LaViLa auto-narration library\n",
        "Now that the data is prepared, let's install LaViLa library.\n",
        "\n",
        "LaViLa (Language augmented Video Language Pretraining) is a video narration model that is trained on Ego4D.\n",
        "It is used for generating text descriptions for your captured recordings in this tutorial.\n",
        "- Paper: https://arxiv.org/abs/2212.04501\n",
        "- Code: https://github.com/facebookresearch/LaViLa"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7vLd1utsaqAr",
      "metadata": {
        "id": "7vLd1utsaqAr"
      },
      "outputs": [],
      "source": [
        "# install LaViLa as dependency\n",
        "!pip install git+https://github.com/zhaoyang-lv/LaViLa"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9xPZVr6eMIsp",
      "metadata": {
        "id": "9xPZVr6eMIsp"
      },
      "source": [
        "## Step 7. Define helper functions for LaViLa\n",
        "\n",
        "Run the following cell for defining helper functions for (1) loading pre-trained models and tokenizers, (2) decoding generated tokens, and (3) run model on a batch of snippets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "qFnuvPo1b3Dk",
      "metadata": {
        "id": "qFnuvPo1b3Dk"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import urllib.request\n",
        "from collections import OrderedDict\n",
        "import torch\n",
        "\n",
        "from lavila.models.models import VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL, VCLM_OPENAI_TIMESFORMER_BASE_GPT2\n",
        "from lavila.models.tokenizer import MyGPT2Tokenizer\n",
        "\n",
        "DEFAULT_CHECKPOINT = 'vclm_openai_timesformer_base_gpt2_base.pt_ego4d.jobid_319630.ep_0002.md5sum_68a71f.pth'\n",
        "# DEFAULT_CHECKPOINT = 'vclm_openai_timesformer_large_336px_gpt2_xl.pt_ego4d.jobid_246897.ep_0003.md5sum_443263.pth'\n",
        "\n",
        "def load_models_and_transforms(num_frames=4, ckpt_name=DEFAULT_CHECKPOINT, device='cpu'):\n",
        "    '''\n",
        "    Helper function for loading oading pre-trained models and tokenizers\n",
        "    '''\n",
        "    ckpt_path = os.path.join('lavila/modelzoo/', ckpt_name)\n",
        "    print(f\"ckpt_path: {os.path.abspath(ckpt_path)}\")\n",
        "    os.makedirs('lavila/modelzoo/', exist_ok=True)\n",
        "    if not os.path.exists(ckpt_path):\n",
        "        print('downloading model to {}'.format(ckpt_path))\n",
        "        urllib.request.urlretrieve('https://dl.fbaipublicfiles.com/lavila/checkpoints/narrator/{}'.format(ckpt_name), ckpt_path)\n",
        "    ckpt = torch.load(ckpt_path, map_location='cpu')\n",
        "    state_dict = OrderedDict()\n",
        "    for k, v in ckpt['state_dict'].items():\n",
        "        state_dict[k.replace('module.', '')] = v\n",
        "\n",
        "    # instantiate the model, and load the pre-trained weights\n",
        "    # model = VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL(\n",
        "    model = VCLM_OPENAI_TIMESFORMER_BASE_GPT2(\n",
        "        text_use_cls_token=False,\n",
        "        project_embed_dim=256,\n",
        "        gated_xattn=True,\n",
        "        timesformer_gated_xattn=False,\n",
        "        freeze_lm_vclm=False,      # we use model.eval() anyway\n",
        "        freeze_visual_vclm=False,  # we use model.eval() anyway\n",
        "        num_frames=num_frames,\n",
        "        drop_path_rate=0.\n",
        "    )\n",
        "    model.load_state_dict(state_dict, strict=True)\n",
        "\n",
        "    device_type_str = device.type if isinstance(device, torch.device) else device\n",
        "    if device_type_str != 'cpu':\n",
        "        model = model.to(device)\n",
        "    model.eval()\n",
        "\n",
        "    tokenizer = MyGPT2Tokenizer('gpt2', add_bos=True)\n",
        "    #tokenizer = MyGPT2Tokenizer('gpt2-xl', add_bos=True)\n",
        "\n",
        "    return model, tokenizer\n",
        "\n",
        "\n",
        "def decode_one(generated_ids, tokenizer):\n",
        "    '''\n",
        "    Helper function for decoding generated tokens.\n",
        "    '''\n",
        "    # get the index of <EOS>\n",
        "    if tokenizer.eos_token_id == tokenizer.bos_token_id:\n",
        "        if tokenizer.eos_token_id in generated_ids[1:].tolist():\n",
        "            eos_id = generated_ids[1:].tolist().index(tokenizer.eos_token_id) + 1\n",
        "        else:\n",
        "            eos_id = len(generated_ids.tolist()) - 1\n",
        "    elif tokenizer.eos_token_id in generated_ids.tolist():\n",
        "        eos_id = generated_ids.tolist().index(tokenizer.eos_token_id)\n",
        "    else:\n",
        "        eos_id = len(generated_ids.tolist()) - 1\n",
        "    generated_text_str = tokenizer.tokenizer.decode(generated_ids[1:eos_id].tolist())\n",
        "    return generated_text_str\n",
        "\n",
        "\n",
        "def run_model_on_snippets(\n",
        "    frames, model, tokenizer, device=\"cpu\", narration_max_sentences=5\n",
        "):\n",
        "    '''\n",
        "    Function for running the LaViLa model on batches of snippets.\n",
        "    '''\n",
        "    with torch.no_grad():\n",
        "        image_features = model.encode_image(frames)\n",
        "        generated_text_ids, ppls = model.generate(\n",
        "            image_features,\n",
        "            tokenizer,\n",
        "            target=None,  # free-form generation\n",
        "            max_text_length=77,\n",
        "            top_k=None,\n",
        "            top_p=0.95,   # nucleus sampling\n",
        "            num_return_sequences=narration_max_sentences,  # number of candidates: 10\n",
        "            temperature=0.7,\n",
        "            early_stopping=True,\n",
        "        )\n",
        "    output_narration = []\n",
        "    for j in range(generated_text_ids.shape[0] // narration_max_sentences):\n",
        "        cur_output_narration = []\n",
        "        for k in range(narration_max_sentences):\n",
        "            jj = j * narration_max_sentences + k\n",
        "            generated_text_str = decode_one(generated_text_ids[jj], tokenizer)\n",
        "            generated_text_str = generated_text_str.strip()\n",
        "            generated_text_str = generated_text_str.replace(\"#c c\", \"#C C\")\n",
        "            if generated_text_str in cur_output_narration:\n",
        "                continue\n",
        "            if generated_text_str.endswith('the'):\n",
        "                # skip incomplete sentences\n",
        "                continue\n",
        "            cur_output_narration.append(generated_text_str)\n",
        "        output_narration.append(cur_output_narration) # list of size B (batch size)\n",
        "    return output_narration"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "mJ7qlnBevC06",
      "metadata": {
        "id": "mJ7qlnBevC06"
      },
      "source": [
        "## Step 8. Run LaViLa inference over vrs file\n",
        "Let's load the pre-traiend model and tokenizer\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d7C99NViqb_a",
      "metadata": {
        "id": "d7C99NViqb_a"
      },
      "outputs": [],
      "source": [
        "# load the pre-traiend model and tokenizer.\n",
        "model, tokenizer = load_models_and_transforms(num_frames=4)\n",
        "\n",
        "# this is where the generated narration will be stored\n",
        "narrations_dict = {\n",
        "    'start_time_ns': [],\n",
        "    'end_time_ns': [],\n",
        "    'narration': [],\n",
        "}\n",
        "\n",
        "# use gpu if available\n",
        "if torch.cuda.is_available():\n",
        "  model = model.cuda()\n",
        "\n",
        "for idx, (frames, st_ns, ed_ns) in enumerate(snippet_dataloader):\n",
        "  if torch.cuda.is_available():\n",
        "    frames = frames.cuda()\n",
        "  # run inference over a batch of snippet\n",
        "  output_narration = run_model_on_snippets(frames, model, tokenizer)\n",
        "  # store results\n",
        "  narrations_dict['start_time_ns'].extend(st_ns.numpy().tolist())\n",
        "  narrations_dict['end_time_ns'].extend(ed_ns.numpy().tolist())\n",
        "  narrations_dict['narration'].extend(output_narration)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "z3pRVNe5vPIL",
      "metadata": {
        "id": "z3pRVNe5vPIL"
      },
      "source": [
        "## Step 9. Display the auto-narration results and save to csv file\n",
        "Make sure to change `narration_save_path` to your desired location!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "E6nUZYKurDpO",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 238
        },
        "id": "E6nUZYKurDpO",
        "outputId": "8164a676-94d8-4c90-8d34-2b9674d60947"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-709b9be8-de5d-4fa3-b612-f5fb6a346ef4\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>start_time_ns</th>\n",
              "      <th>end_time_ns</th>\n",
              "      <th>narration</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>148502526450</td>\n",
              "      <td>150502526450</td>\n",
              "      <td>[#C C stares at the ceiling, #C C looks around...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>150502526450</td>\n",
              "      <td>152502526450</td>\n",
              "      <td>[#C C looks around the house, #C C looks aroun...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>152502526450</td>\n",
              "      <td>154502526450</td>\n",
              "      <td>[#C C adjusts the camera, #C C looks around]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>154502526450</td>\n",
              "      <td>156502526450</td>\n",
              "      <td>[#C C looks around]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>156502526450</td>\n",
              "      <td>158502526450</td>\n",
              "      <td>[#C C stands beside the door, #C C looks around]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>158502526450</td>\n",
              "      <td>160502526450</td>\n",
              "      <td>[#C C looks at the wall, #C C looks around, #C...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-709b9be8-de5d-4fa3-b612-f5fb6a346ef4')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-709b9be8-de5d-4fa3-b612-f5fb6a346ef4 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-709b9be8-de5d-4fa3-b612-f5fb6a346ef4');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-f9b5fcf7-00b2-428b-b18a-610e592c61d4\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f9b5fcf7-00b2-428b-b18a-610e592c61d4')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-f9b5fcf7-00b2-428b-b18a-610e592c61d4 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "   start_time_ns   end_time_ns  \\\n",
              "0   148502526450  150502526450   \n",
              "1   150502526450  152502526450   \n",
              "2   152502526450  154502526450   \n",
              "3   154502526450  156502526450   \n",
              "4   156502526450  158502526450   \n",
              "5   158502526450  160502526450   \n",
              "\n",
              "                                           narration  \n",
              "0  [#C C stares at the ceiling, #C C looks around...  \n",
              "1  [#C C looks around the house, #C C looks aroun...  \n",
              "2       [#C C adjusts the camera, #C C looks around]  \n",
              "3                                [#C C looks around]  \n",
              "4   [#C C stands beside the door, #C C looks around]  \n",
              "5  [#C C looks at the wall, #C C looks around, #C...  "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "narration_save_path = os.path.join(os.path.dirname(vrsfile), 'auto_narration.csv')\n",
        "\n",
        "import pandas as pd\n",
        "df = pd.DataFrame(narrations_dict)\n",
        "df.to_csv(narration_save_path)\n",
        "display(df)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "BjF3Ms0l3igG",
      "metadata": {
        "id": "BjF3Ms0l3igG"
      },
      "source": [
        "# Optional Steps for Speech2Text\n",
        "Proceed Step 10-14, if you have speech in your recording and would like to use it."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "upn5Is4AhRpm",
      "metadata": {
        "id": "upn5Is4AhRpm"
      },
      "source": [
        "## Step 10. Build VRS Tool to extract .wav file for audio captioning\n",
        "Whisper X can be run on .wav file. We need to install VRSTool for extracting .wav file from .vrs file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "N5gLce1Py09s",
      "metadata": {
        "id": "N5gLce1Py09s"
      },
      "outputs": [],
      "source": [
        "!sudo apt-get update\n",
        "# Install VRS dependencies\n",
        "!sudo apt-get install cmake git ninja-build ccache libgtest-dev libfmt-dev libturbojpeg-dev libpng-dev\n",
        "!sudo apt-get install liblz4-dev libzstd-dev libxxhash-dev\n",
        "!sudo apt-get install libboost-system-dev libboost-filesystem-dev libboost-thread-dev libboost-chrono-dev libboost-date-time-dev\n",
        "# Install build dependencies\n",
        "!sudo apt-get install -y cmake ninja-build\n",
        "\n",
        "#clone and build\n",
        "!git clone https://github.com/facebookresearch/vrs.git\n",
        "!cmake -S vrs -B vrs/build -G Ninja\n",
        "!cd vrs/build; ninja vrs\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "jXN0HNZ5iGOy",
      "metadata": {
        "id": "jXN0HNZ5iGOy"
      },
      "source": [
        "## Step 11. Extract .wav file from VRS file\n",
        "Now that VRSTool is installed, let's extract .wav file from .vrs file.\n",
        "\n",
        "Here, the extracted .wav file is saved to the current working directory.\n",
        "\n",
        "If you anticipate this file to be re-used, change the output path using the argument `--to <google_drive_path>`\n",
        "\n",
        "*(Ignore error '[AudioExtractor][ERROR]: os::makeDirectories(folderPath_) failed: 22, Invalid argument')*"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "lSSdMm3Tz31_",
      "metadata": {
        "id": "lSSdMm3Tz31_"
      },
      "outputs": [],
      "source": [
        "!./vrs/build/tools/vrs/vrs extract-audio \"{vrsfile}\" --to ."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "Wleqb4JLkNNE",
      "metadata": {
        "id": "Wleqb4JLkNNE"
      },
      "source": [
        "## Step 12. Install Whisper X\n",
        "We have input data ready for Whisper X. Let's install the library.\n",
        "\n",
        "Whisper X is an automatic speech recognition method that provides word-level timestamps and speaker diarization.\n",
        "- Paper: https://arxiv.org/abs/2303.00747\n",
        "- Code: https://github.com/m-bain/whisperX"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "0Su4GYTFkPKU",
      "metadata": {
        "id": "0Su4GYTFkPKU"
      },
      "outputs": [],
      "source": [
        "!pip install git+https://github.com/m-bain/whisperx.git"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "1a2IoR3CiMxl",
      "metadata": {
        "id": "1a2IoR3CiMxl"
      },
      "source": [
        "## Step 13. Define helper functions for Whisper X\n",
        "Let's define some helper functions for Whisper X, this include a postprocessing function and a function to align the output to the timestamps."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "_wavV43tiSwM",
      "metadata": {
        "id": "_wavV43tiSwM"
      },
      "outputs": [],
      "source": [
        "import logging\n",
        "import os.path as osp\n",
        "import numpy as np\n",
        "import glob\n",
        "import os\n",
        "import pandas as pd\n",
        "import whisperx\n",
        "import tqdm\n",
        "\n",
        "logger = logging.getLogger(__name__)\n",
        "logging.basicConfig(level=logging.INFO)\n",
        "\n",
        "device = \"cuda\"\n",
        "\n",
        "def asr_tokens_to_csv(\n",
        "    word_segments,\n",
        "    token_csv_folder: str,\n",
        "    starting_timestamp_s: float = 0.0,\n",
        "):\n",
        "    # post process the output asr file to extract only the minimal needed content\n",
        "\n",
        "    df = pd.DataFrame.from_dict(word_segments)\n",
        "    os.makedirs(token_csv_folder, exist_ok=True)\n",
        "\n",
        "    # write to wav domain:\n",
        "    s_to_ms = int(1e3)\n",
        "    df = df.fillna(-1)\n",
        "    df[\"start\"] = (df[\"start\"] * s_to_ms).astype(\"int64\")\n",
        "    df[\"end\"] = (df[\"end\"] * s_to_ms).astype(\"int64\")\n",
        "    df_speech_wav = df.rename(\n",
        "        columns={\"start\": \"startTime_ms\", \"end\": \"endTime_ms\", \"text\": \"written\"},\n",
        "    )\n",
        "    df_speech_wav.to_csv(\n",
        "        osp.join(token_csv_folder, \"speech.csv\"), index=False, header=True\n",
        "    )\n",
        "\n",
        "    # Update ASR ms time to Aria ns time\n",
        "    s_to_ns = int(1e9)\n",
        "    ms_to_ns = int(1e6)\n",
        "    df[\"start\"] = (df[\"start\"] * ms_to_ns + starting_timestamp_s * s_to_ns).astype(\n",
        "        \"int64\"\n",
        "    )\n",
        "    df[\"end\"] = (df[\"end\"] * ms_to_ns + starting_timestamp_s * s_to_ns).astype(\"int64\")\n",
        "\n",
        "    df_aria_domain = df.rename(\n",
        "        columns={\"start\": \"startTime_ms\", \"end\": \"endTime_ms\", \"text\": \"written\"},\n",
        "    )\n",
        "    df_aria_domain.to_csv(\n",
        "        osp.join(token_csv_folder, \"speech_aria_domain.csv\"), index=False, header=True\n",
        "    )\n",
        "\n",
        "    logging.info(f\"Generate speech.csv & speech_aria_domain.csv to {token_csv_folder}\")\n",
        "\n",
        "\n",
        "def run_whisperx_aria_wav(\n",
        "    model,\n",
        "    file_path: str,\n",
        "    output_folder: str = \"\",\n",
        "    batch_size = None,\n",
        "):\n",
        "    \"\"\"\n",
        "    Run whisperx model on .wav file extracted from VRS file\n",
        "    \"\"\"\n",
        "    starting_timestamp = file_path.split(\"-\")[-1].replace(\".wav\", \"\")\n",
        "    starting_timestamp = float(starting_timestamp)\n",
        "    logging.info(\"Aria Starting timestamp: {:0.3f}\".format(starting_timestamp))\n",
        "\n",
        "    logging.info(f\"Transcribe the speech from wav file {file_path}.\")\n",
        "    result = model.transcribe(file_path, batch_size=batch_size)\n",
        "    print(f\"Transcription done.\")\n",
        "\n",
        "\n",
        "    model_a, metadata = whisperx.load_align_model(language_code=result[\"language\"], device=device)\n",
        "    logging.info(f\"Transcription done.\")\n",
        "    result_aligned = whisperx.align(\n",
        "        result[\"segments\"], model_a, metadata, file_path, device\n",
        "    )\n",
        "    print(f\"Alignment done.\")\n",
        "\n",
        "    try:\n",
        "        asr_tokens_to_csv(\n",
        "            word_segments=result_aligned[\"word_segments\"],\n",
        "            token_csv_folder=output_folder,\n",
        "            starting_timestamp_s=starting_timestamp,\n",
        "        )\n",
        "    except Exception as err:\n",
        "        logging.warning(f\"Cannot process {file_path} because {err}. Skip this recording.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "C3q0o21rRIbm",
      "metadata": {
        "id": "C3q0o21rRIbm"
      },
      "source": [
        "\n",
        "## Step 14. Run Whisper X\n",
        "Finally, let's run Whisper X on the .wav file that we extracted.\n",
        "\n",
        "Make sure to\n",
        "- Change the `audio_file` to the .wav file that we extracted in Step 12.\n",
        "- Set the `whisper_x_output_folder` to desired location. The resulting file name is `speech_aria_domain.csv`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ZODeWa-ilKMJ",
      "metadata": {
        "id": "ZODeWa-ilKMJ"
      },
      "outputs": [],
      "source": [
        "audio_file = '231-1-0000-743.444.wav'\n",
        "whisper_x_output_folder = \".\"\n",
        "compute_type = \"float16\" # change to \"int8\" if low on GPU mem (may reduce accuracy)\n",
        "model = whisperx.load_model(\"large-v2\", device, compute_type=compute_type,) #  language='en'\n",
        "batch_size = 16 # reduce if low on GPU mem, or keep it None\n",
        "provider = run_whisperx_aria_wav(model, audio_file, output_folder=whisper_x_output_folder, batch_size=batch_size)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "670FaxtH1OGS",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 424
        },
        "id": "670FaxtH1OGS",
        "outputId": "dbc76cb8-2b62-4c96-e3da-2c3d42c63731"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-36794bf5-9be7-4403-a6a7-b3f62cfc8ad2\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>word</th>\n",
              "      <th>startTime_ms</th>\n",
              "      <th>endTime_ms</th>\n",
              "      <th>score</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Awesome.</td>\n",
              "      <td>744915000000</td>\n",
              "      <td>745275000000</td>\n",
              "      <td>0.535</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Would</td>\n",
              "      <td>745836000000</td>\n",
              "      <td>745936000000</td>\n",
              "      <td>0.451</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>you</td>\n",
              "      <td>745956000000</td>\n",
              "      <td>746016000000</td>\n",
              "      <td>0.958</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>like</td>\n",
              "      <td>746056000000</td>\n",
              "      <td>746156000000</td>\n",
              "      <td>0.932</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>some</td>\n",
              "      <td>746176000000</td>\n",
              "      <td>746296000000</td>\n",
              "      <td>0.759</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>222</th>\n",
              "      <td>Thank</td>\n",
              "      <td>927758000000</td>\n",
              "      <td>927978000000</td>\n",
              "      <td>0.609</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>223</th>\n",
              "      <td>you.</td>\n",
              "      <td>927998000000</td>\n",
              "      <td>928159000000</td>\n",
              "      <td>0.912</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>224</th>\n",
              "      <td>Big</td>\n",
              "      <td>934170000000</td>\n",
              "      <td>934330000000</td>\n",
              "      <td>0.652</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>225</th>\n",
              "      <td>long</td>\n",
              "      <td>934370000000</td>\n",
              "      <td>934671000000</td>\n",
              "      <td>0.950</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>226</th>\n",
              "      <td>sip.</td>\n",
              "      <td>934691000000</td>\n",
              "      <td>934931000000</td>\n",
              "      <td>0.780</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>227 rows × 4 columns</p>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-36794bf5-9be7-4403-a6a7-b3f62cfc8ad2')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-36794bf5-9be7-4403-a6a7-b3f62cfc8ad2 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-36794bf5-9be7-4403-a6a7-b3f62cfc8ad2');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-36a0f26e-0d64-4f0a-860c-987ea512536c\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-36a0f26e-0d64-4f0a-860c-987ea512536c')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-36a0f26e-0d64-4f0a-860c-987ea512536c button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "         word  startTime_ms    endTime_ms  score\n",
              "0    Awesome.  744915000000  745275000000  0.535\n",
              "1       Would  745836000000  745936000000  0.451\n",
              "2         you  745956000000  746016000000  0.958\n",
              "3        like  746056000000  746156000000  0.932\n",
              "4        some  746176000000  746296000000  0.759\n",
              "..        ...           ...           ...    ...\n",
              "222     Thank  927758000000  927978000000  0.609\n",
              "223      you.  927998000000  928159000000  0.912\n",
              "224       Big  934170000000  934330000000  0.652\n",
              "225      long  934370000000  934671000000  0.950\n",
              "226      sip.  934691000000  934931000000  0.780\n",
              "\n",
              "[227 rows x 4 columns]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "asr_df = pd.read_csv(\"speech_aria_domain.csv\")\n",
        "display(asr_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ihTkYWNb4BAG",
      "metadata": {
        "id": "ihTkYWNb4BAG"
      },
      "source": [
        "# Optional steps for summarization example\n",
        "Proceed Step 15-17, if you would like to try out summarization of the narration using llm (via langchain)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "SBANXKSPR1eP",
      "metadata": {
        "id": "SBANXKSPR1eP"
      },
      "source": [
        "## Step 15. Install Langchain"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "t25suH5HR5jb",
      "metadata": {
        "id": "t25suH5HR5jb"
      },
      "outputs": [],
      "source": [
        "!pip install langchain"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "temVxjVVWBw3",
      "metadata": {
        "id": "temVxjVVWBw3"
      },
      "source": [
        "## Step 16. Install OpenAI to use with Langchain"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "r6QEuPO0WFgC",
      "metadata": {
        "id": "r6QEuPO0WFgC"
      },
      "outputs": [],
      "source": [
        "import locale\n",
        "locale.getpreferredencoding = lambda: \"UTF-8\"\n",
        "!pip install openai"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "Nb6m7nHzSBy9",
      "metadata": {
        "id": "Nb6m7nHzSBy9"
      },
      "source": [
        "## Step 17. Summaraize the narration result\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "igRz39kKSBES",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        },
        "id": "igRz39kKSBES",
        "outputId": "c4f09b38-7d0e-42a1-c451-a50c88f250c7"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'The summary describes a series of events with timestamps and corresponding narrations. The events involve a character named C C who is observed looking around, adjusting the camera, staring at the ceiling, standing beside the door, and looking at various objects in the room and house.'"
            ]
          },
          "execution_count": 56,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from langchain.chat_models import ChatOpenAI\n",
        "from langchain.chains.summarize import load_summarize_chain\n",
        "from langchain.docstore.document import Document\n",
        "from langchain.document_loaders.csv_loader import CSVLoader\n",
        "from langchain import PromptTemplate\n",
        "\n",
        "\n",
        "prompt_template = \"\"\" Write a concise summary (between 5 to 10 sentences) of the following text.\n",
        "The text is about my exhaustive timeline, where I am referred to as '#C C' or 'C C' or 'C'.\n",
        "Please use first person pronoun (I) in the summary, instead of 'C' or 'C C'.\n",
        "Please keep in mind that some observations maybe incorrect as the timeline was machine-generated.\n",
        "\n",
        "Timeline:\n",
        "\n",
        "{text}\n",
        "\n",
        "\n",
        "TL'DR: \"\"\"\n",
        "\n",
        "#os.environ[\"OPENAI_API_KEY\"] = \"sk-your-key\"\n",
        "PROMPT = PromptTemplate(template=prompt_template, input_variables=[\"text\"])\n",
        "llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\")\n",
        "chain = load_summarize_chain(llm, chain_type=\"stuff\", prompt=PROMPT)\n",
        "\n",
        "loader = CSVLoader(file_path=narration_save_path)\n",
        "docs = loader.load()\n",
        "\n",
        "chain.run(docs)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "oV6Tn_WACKMF",
      "metadata": {
        "id": "oV6Tn_WACKMF"
      },
      "source": [
        "## (Optional) RGBDataset\n",
        "\n",
        "The RGBDataset is a simple image pytorch dataset designed for image-based models that operate on individual frames rather than snippet inputs. Use this dataset that process single frames."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "kvAqkUzdCLPt",
      "metadata": {
        "id": "kvAqkUzdCLPt"
      },
      "outputs": [],
      "source": [
        "# import torch\n",
        "# from torch.utils.data import Dataset, DataLoader\n",
        "# from PIL import Image\n",
        "# import torchvision.transforms as transforms\n",
        "\n",
        "# class RGBDataset(Dataset):\n",
        "#     def __init__(self, start_time, end_time, sample_count, transform=None):\n",
        "#         self.timestamps = np.linspace(start_time, end_time, sample_count)\n",
        "#         self.rgb_stream_id = StreamId(\"214-1\")\n",
        "#         self.time_domain = TimeDomain.DEVICE_TIME\n",
        "#         self.option = TimeQueryOptions.CLOSEST\n",
        "#         self.transform = transform\n",
        "\n",
        "#     def __len__(self):\n",
        "#         return len(self.timestamps)\n",
        "\n",
        "#     def __getitem__(self, idx):\n",
        "#         timestamp = self.timestamps[idx]\n",
        "#         image_tuple = provider.get_image_data_by_time_ns(self.rgb_stream_id, int(timestamp), self.time_domain, self.option)\n",
        "#         image_array = image_tuple[0].to_numpy_array()\n",
        "#         image = Image.fromarray(image_array).rotate(-90)\n",
        "#         if self.transform:\n",
        "#           image = self.transform(image)\n",
        "#         return timestamp, image\n",
        "\n",
        "# val_transform = transforms.Compose([\n",
        "#     transforms.Resize(224),\n",
        "#     transforms.ToTensor(),\n",
        "#   ])\n",
        "\n",
        "# rgb_dataset = RGBDataset(start_time, end_time, sample_count, transform=val_transform)\n",
        "# image_dataloader = DataLoader(rgb_dataset, batch_size=2, shuffle=False)\n",
        "# # Get the next batch of data\n",
        "# timestamp, image = next(iter(image_dataloader))"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "custom": {
      "cells": [],
      "metadata": {
        "fileHeader": "",
        "kernelspec": {
          "display_name": "Python 3 (ipykernel)",
          "language": "python",
          "name": "python3"
        },
        "language_info": {
          "codemirror_mode": {
            "name": "ipython",
            "version": 3
          },
          "file_extension": ".py",
          "mimetype": "text/x-python",
          "name": "python",
          "nbconvert_exporter": "python",
          "pygments_lexer": "ipython3",
          "version": "3.10.11"
        }
      },
      "nbformat": 4,
      "nbformat_minor": 5
    },
    "indentAmount": 2,
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}


================================================
FILE: notebooks/object_detection_tutorial.ipynb
================================================
[File too large to display: 20.5 MB]

================================================
FILE: notebooks/ocr_tutorial.ipynb
================================================
{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "fb140897",
      "metadata": {
        "id": "fb140897"
      },
      "source": [
        "# OCR Tutorial\n",
        "\n",
        "In this tutorial, you will learn how to use OCR (EasyOCR) to detect text from Aria frames.\n",
        "\n",
        "\n",
        "### Notebook stuck?\n",
        "Note that because of Jupyter issues, sometimes the code may stuck at visualization. We recommend **restart the kernels** and try again to see if the issue is resolved."
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 1. Install Project Aria Tools\n",
        "Run the following cell to install Project Aria Tools for reading Aria recordings in .vrs format"
      ],
      "metadata": {
        "id": "l5LlycOs75bf"
      },
      "id": "l5LlycOs75bf"
    },
    {
      "cell_type": "code",
      "source": [
        "# Specifics for Google Colab\n",
        "google_colab_env = 'google.colab' in str(get_ipython())\n",
        "print(\"Running from Google Colab, installing projectaria_tools\")\n",
        "!pip install projectaria-tools"
      ],
      "metadata": {
        "id": "Yt5CoQlp8Cxw"
      },
      "id": "Yt5CoQlp8Cxw",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 2. Prepare an Aria recording\n",
        "\n",
        "We will set the vrsfile path to your collected Aria recording.\n",
        "\n",
        "Upload your Aria recording in your Google Drive before running the cell.\n",
        "\n",
        "Here, we assume it is uploaded to **`My Drive/Fridge/sample.vrs`**\n",
        "\n",
        "*(You can check the content of the mounted drive by running `!ls \"/content/drive/My Drive/\"` in a cell.)*\n",
        "\n"
      ],
      "metadata": {
        "id": "Tv0I3ajm7TyH"
      },
      "id": "Tv0I3ajm7TyH"
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import drive\n",
        "import os\n",
        "drive.flush_and_unmount()\n",
        "drive.mount('/content/drive/')\n",
        "my_vrs_file_path = 'Fridge/sample.vrs'\n",
        "vrsfile = \"/content/drive/My Drive/\" + my_vrs_file_path\n",
        "print(f\"INFO: vrsfile set to {vrsfile}\")"
      ],
      "metadata": {
        "id": "vnjjSGXlc4-i"
      },
      "id": "vnjjSGXlc4-i",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "id": "8196ad05",
      "metadata": {
        "id": "8196ad05"
      },
      "source": [
        "## Step 3. Create data provider\n",
        "\n",
        "Create projectaria data_provider so you can load the content of the vrs file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "fb04b53b",
      "metadata": {
        "id": "fb04b53b"
      },
      "outputs": [],
      "source": [
        "from projectaria_tools.core import data_provider, calibration\n",
        "from projectaria_tools.core.sensor_data import TimeDomain, TimeQueryOptions\n",
        "from projectaria_tools.core.stream_id import RecordableTypeId, StreamId\n",
        "import numpy as np\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "print(f\"Creating data provider from {vrsfile}\")\n",
        "provider = data_provider.create_vrs_data_provider(vrsfile)\n",
        "if not provider:\n",
        "    print(\"Invalid vrs data provider\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c5033225",
      "metadata": {
        "id": "c5033225"
      },
      "source": [
        "## Step 4. Display VRS rgb content in thumbnail images\n",
        "\n",
        "Goals:\n",
        "- Summarize a VRS using 20 image side by side, to visually inspect the collected data.\n",
        "\n",
        "Key learnings:\n",
        "- Image streams are identified with a Unique Identifier: stream_id\n",
        "- Image frames are identified with timestamps\n",
        "- PIL images can be created from Numpy array\n",
        "\n",
        "Customization\n",
        "- To change the number of sampled images, change the variable `sample_count` to a desired number.\n",
        "- To change the thumbnail size, change the variable `resize_ratio` to a desired value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "933725b6",
      "metadata": {
        "id": "933725b6"
      },
      "outputs": [],
      "source": [
        "from PIL import Image\n",
        "from tqdm import tqdm\n",
        "\n",
        "sample_count = 30\n",
        "resize_ratio = 10\n",
        "\n",
        "rgb_stream_id = StreamId(\"214-1\")\n",
        "\n",
        "# Retrieve image size for the RGB stream\n",
        "time_domain = TimeDomain.DEVICE_TIME  # query data based on host time\n",
        "option = TimeQueryOptions.CLOSEST # get data whose time [in TimeDomain] is CLOSEST to query time\n",
        "\n",
        "# Retrieve Start and End time for the given Sensor Stream Id\n",
        "start_time = provider.get_first_time_ns(rgb_stream_id, time_domain)\n",
        "end_time = provider.get_last_time_ns(rgb_stream_id, time_domain)\n",
        "\n",
        "image_config = provider.get_image_configuration(rgb_stream_id)\n",
        "width = image_config.image_width\n",
        "height = image_config.image_height\n",
        "\n",
        "thumbnail = newImage = Image.new(\n",
        "    \"RGB\", (int(width * sample_count / resize_ratio), int(height / resize_ratio))\n",
        ")\n",
        "current_width = 0\n",
        "\n",
        "\n",
        "# Samples 10 timestamps\n",
        "sample_timestamps = np.linspace(start_time, end_time, sample_count)\n",
        "for sample in tqdm(sample_timestamps):\n",
        "    image_tuple = provider.get_image_data_by_time_ns(rgb_stream_id, int(sample), time_domain, option)\n",
        "    image_array = image_tuple[0].to_numpy_array()\n",
        "    image = Image.fromarray(image_array)\n",
        "    new_size = (\n",
        "        int(image.size[0] / resize_ratio),\n",
        "        int(image.size[1] / resize_ratio),\n",
        "    )\n",
        "    image = image.resize(new_size).rotate(-90)\n",
        "    thumbnail.paste(image, (current_width, 0))\n",
        "    current_width = int(current_width + width / resize_ratio)\n",
        "\n",
        "from IPython.display import Image\n",
        "display(thumbnail)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 5. Install EasyOCR\n"
      ],
      "metadata": {
        "id": "eEPHeoB8kWvR"
      },
      "id": "eEPHeoB8kWvR"
    },
    {
      "cell_type": "code",
      "source": [
        "# Install detectron2\n",
        "!pip install easyocr"
      ],
      "metadata": {
        "id": "OAkVc2HvkWO9"
      },
      "id": "OAkVc2HvkWO9",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 6. Run OCR\n",
        "\n",
        "Run OCR for each sampled timestamps in Step 4.\n",
        "\n",
        "- The detected text will be stored in `ocr_dict`.\n",
        "\n",
        "- You can set the image size using the `imsize` variable.\n",
        "\n",
        "- You can add list of languages to be parsed as follows:\n",
        "```\n",
        "reader = easyocr.Reader(['en', 'fr', 'ch_sim'])\n",
        "```\n",
        "For all supported languages in EasyOCR, see https://www.jaided.ai/easyocr/.\n",
        "\n",
        "- You can set the `confidence_thres` to only keep the texts that have confidences above the threshold.\n",
        "\n",
        "- The output will be in a list format, each item represents a bounding box, the text detected and confident level, respectively.\n",
        "```\n",
        "[ ([[226, 170], [414, 170], [414, 220], [226, 220]], 'Yuyuan Rd.', 0.8261902332305908),\n",
        " ([[79, 173], [125, 173], [125, 213], [79, 213]], 'W', 0.9848111271858215),\n",
        " ([[529, 173], [569, 173], [569, 213], [529, 213]], 'E', 0.8405593633651733)]\n",
        " ```\n"
      ],
      "metadata": {
        "id": "yB51eG_v72kg"
      },
      "id": "yB51eG_v72kg"
    },
    {
      "cell_type": "code",
      "source": [
        "import easyocr\n",
        "from PIL import Image\n",
        "\n",
        "imsize = 3072\n",
        "confidence_thres = 0.2\n",
        "\n",
        "ocr_dict = {\n",
        "    'timestamps': [],\n",
        "    'texts': [],\n",
        "    'bboxes': [],\n",
        "}\n",
        "\n",
        "reader = easyocr.Reader(['en',])  # Load EasyOCR model. Only need to be called once.\n",
        "\n",
        "for sample in tqdm(sample_timestamps):\n",
        "\n",
        "    # Fetch image\n",
        "    image_tuple = provider.get_image_data_by_time_ns(rgb_stream_id, int(sample), time_domain, option)\n",
        "    image_array = image_tuple[0].to_numpy_array()\n",
        "    image = Image.fromarray(image_array)\n",
        "    new_size = (imsize, imsize)\n",
        "    image = np.asarray(image.resize(new_size).rotate(-90))\n",
        "    result = reader.readtext(image)\n",
        "    print(f\"result: {result}\")\n",
        "\n",
        "    if result is not None:\n",
        "      ocr_dict['timestamps'].append(sample)\n",
        "      ocr_dict['bboxes'].append([res[0] for res in result if res[2]> confidence_thres])\n",
        "      ocr_dict['texts'].append([res[1] for res in result if res[2]> confidence_thres])\n"
      ],
      "metadata": {
        "id": "Y8M3YzzW2HIz"
      },
      "id": "Y8M3YzzW2HIz",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Step 8. Display and save detected text lists\n",
        "\n",
        "### We can get ocr results for each timestamps."
      ],
      "metadata": {
        "id": "Aplmxisd8Hla"
      },
      "id": "Aplmxisd8Hla"
    },
    {
      "cell_type": "code",
      "source": [
        "ocr_save_path = '/content/ocr_results.json'\n",
        "\n",
        "import pandas as pd\n",
        "df = pd.DataFrame(ocr_dict)\n",
        "df.to_json(ocr_save_path)\n",
        "# Set the maximum width of each column\n",
        "pd.set_option('display.max_colwidth', None)  # Replace None with a number if needed\n",
        "display(df[['timestamps', 'texts']])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 990
        },
        "id": "Eryl0PCE6Axr",
        "outputId": "3336af77-4942-4c34-e965-753c4b549728"
      },
      "id": "Eryl0PCE6Axr",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "      timestamps  \\\n",
              "0   1.641293e+12   \n",
              "1   1.642534e+12   \n",
              "2   1.643776e+12   \n",
              "3   1.645017e+12   \n",
              "4   1.646258e+12   \n",
              "5   1.647500e+12   \n",
              "6   1.648741e+12   \n",
              "7   1.649982e+12   \n",
              "8   1.651224e+12   \n",
              "9   1.652465e+12   \n",
              "10  1.653707e+12   \n",
              "11  1.654948e+12   \n",
              "12  1.656189e+12   \n",
              "13  1.657431e+12   \n",
              "14  1.658672e+12   \n",
              "15  1.659913e+12   \n",
              "16  1.661155e+12   \n",
              "17  1.662396e+12   \n",
              "18  1.663638e+12   \n",
              "19  1.664879e+12   \n",
              "20  1.666120e+12   \n",
              "21  1.667362e+12   \n",
              "22  1.668603e+12   \n",
              "23  1.669845e+12   \n",
              "24  1.671086e+12   \n",
              "25  1.672327e+12   \n",
              "26  1.673569e+12   \n",
              "27  1.674810e+12   \n",
              "28  1.676051e+12   \n",
              "29  1.677293e+12   \n",
              "\n",
              "                                                                                                                                                                              texts  \n",
              "0                                                                                                                                                                                []  \n",
              "1                                                                                                                                                                                []  \n",
              "2                                                                                                                                                                                []  \n",
              "3                                                                                                                                                                                []  \n",
              "4                                                                                                                                                             [AcTiVESMART, %67, 0]  \n",
              "5                                                                                                                     [ActiveshaRT, TECHNOLOGY, %7, Stghattng;, CAMPARI', Tomatoes]  \n",
              "6                                                                                                                                                               [ActineskaRt, 9, 3]  \n",
              "7                                                                              [Harn, Guan, Baby Bella Mushrooms, bebe bella cham, ACTIvESMART, Technology, ladelphia, Reproductte]  \n",
              "8                                                                                                         [WA, Han, and, Cancer, %7, ActiveshaRT, TECHNOLOGT, RNING:, Reproducute ]  \n",
              "9                                                                                                                                                [MdELpha, ACTIvESMART, TECHNOLOGY]  \n",
              "10                                                                                                                                                                 [%7, &, 3, 1, 1]  \n",
              "11                                                                                                                                                     [hadelphia, %at, TECHNOLOGY]  \n",
              "12                                                                                                                                   [Qhdelpha, %2, ActiveshART, Technology, ming']  \n",
              "13                                                                                                                                                                              [4]  \n",
              "14                                                                                                                [WARNING:, Cancer and Reproductive Harm, AcTiveshaRT, Technology]  \n",
              "15                                              [Cancer and Reproductive, WWWP6Swarnings ca , 90v, 9, 3, NET WT 14 02 (3969), WARNING:, Harm, IRADER JOES\", ORCANIC, tuw, Tk\", TCs]  \n",
              "16                                                                                                              [WARNING:, Cancer and Reproductlve Harm, ACTIvEShART, phhladelphia]  \n",
              "17                                                                                                                     [Cancer , AcTIVESHART, WARNING:, Herm, Reproductive [, Yand]  \n",
              "18                                                                                              [WARNING:, Cencer, Reproductive Ham, 07, PHULAdeLphia, 81102, Jnoido, 2310), otics]  \n",
              "19                                                                   [Cancer ond, PhHLAdElphia, WARNING:, Reproductive /, Harm, WwW pGSwarnings ca gov, FR, 0, 8, 1, 8, probiotics]  \n",
              "20  [%, WaRNING, Cancer and[, Reprodrte, TRADER JoES\", rgani s, Choices, BIOLOGIQUR, Biologiquev_, thy, Cov, ELUA, Yanio, erics, TLB, des, Produit, 0z), (16 , of USA, US:, roduce]  \n",
              "21                                                                                       [WARNING:, Cancer and Reproductive }, %, Kyrm, WW p6Swarnngs cagov, TRADER JoES\", abn, FR]  \n",
              "22                                                                                                                                                    [WARNNG, Cancet and[, QH, FR]  \n",
              "23                                                                                                                                          [Du, %2, phuladelphia, 1, 2iotics, FIR]  \n",
              "24                                                                                             [On, WARNING;, Tand Reproductive Harm, Cancer, ca gov, Www p6Swarnings c, %7, otics]  \n",
              "25                                                                                 [PhlAdelphia, chion, gov, %7, WARNING:, Harm, Reproductive [, Tand, Cancer, p6Swarningsca , Www]  \n",
              "26                                                                                   [Ohion, WARNING:, Cancer and Reproductive Harm, wwwp6Swarnings cagov, %7, phLAdELphIa, hve, 1]  \n",
              "27                                                                                                                                                           [P7, Har, Reproductre]  \n",
              "28                                                                                                                                                                [MhLAdELphia, %2]  \n",
              "29                                                                                                                                                     [Activeshart, Technology, 8]  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-1205c51e-199c-48a4-9b4b-ee740ef8c192\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>timestamps</th>\n",
              "      <th>texts</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1.641293e+12</td>\n",
              "      <td>[]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1.642534e+12</td>\n",
              "      <td>[]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>1.643776e+12</td>\n",
              "      <td>[]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>1.645017e+12</td>\n",
              "      <td>[]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>1.646258e+12</td>\n",
              "      <td>[AcTiVESMART, %67, 0]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>1.647500e+12</td>\n",
              "      <td>[ActiveshaRT, TECHNOLOGY, %7, Stghattng;, CAMPARI', Tomatoes]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>1.648741e+12</td>\n",
              "      <td>[ActineskaRt, 9, 3]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>1.649982e+12</td>\n",
              "      <td>[Harn, Guan, Baby Bella Mushrooms, bebe bella cham, ACTIvESMART, Technology, ladelphia, Reproductte]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>1.651224e+12</td>\n",
              "      <td>[WA, Han, and, Cancer, %7, ActiveshaRT, TECHNOLOGT, RNING:, Reproducute ]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>1.652465e+12</td>\n",
              "      <td>[MdELpha, ACTIvESMART, TECHNOLOGY]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>1.653707e+12</td>\n",
              "      <td>[%7, &amp;, 3, 1, 1]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>1.654948e+12</td>\n",
              "      <td>[hadelphia, %at, TECHNOLOGY]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>1.656189e+12</td>\n",
              "      <td>[Qhdelpha, %2, ActiveshART, Technology, ming']</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>1.657431e+12</td>\n",
              "      <td>[4]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>1.658672e+12</td>\n",
              "      <td>[WARNING:, Cancer and Reproductive Harm, AcTiveshaRT, Technology]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>1.659913e+12</td>\n",
              "      <td>[Cancer and Reproductive, WWWP6Swarnings ca , 90v, 9, 3, NET WT 14 02 (3969), WARNING:, Harm, IRADER JOES\", ORCANIC, tuw, Tk\", TCs]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>1.661155e+12</td>\n",
              "      <td>[WARNING:, Cancer and Reproductlve Harm, ACTIvEShART, phhladelphia]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>1.662396e+12</td>\n",
              "      <td>[Cancer , AcTIVESHART, WARNING:, Herm, Reproductive [, Yand]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>1.663638e+12</td>\n",
              "      <td>[WARNING:, Cencer, Reproductive Ham, 07, PHULAdeLphia, 81102, Jnoido, 2310), otics]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>1.664879e+12</td>\n",
              "      <td>[Cancer ond, PhHLAdElphia, WARNING:, Reproductive /, Harm, WwW pGSwarnings ca gov, FR, 0, 8, 1, 8, probiotics]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>1.666120e+12</td>\n",
              "      <td>[%, WaRNING, Cancer and[, Reprodrte, TRADER JoES\", rgani s, Choices, BIOLOGIQUR, Biologiquev_, thy, Cov, ELUA, Yanio, erics, TLB, des, Produit, 0z), (16 , of USA, US:, roduce]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21</th>\n",
              "      <td>1.667362e+12</td>\n",
              "      <td>[WARNING:, Cancer and Reproductive }, %, Kyrm, WW p6Swarnngs cagov, TRADER JoES\", abn, FR]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>1.668603e+12</td>\n",
              "      <td>[WARNNG, Cancet and[, QH, FR]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>23</th>\n",
              "      <td>1.669845e+12</td>\n",
              "      <td>[Du, %2, phuladelphia, 1, 2iotics, FIR]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>24</th>\n",
              "      <td>1.671086e+12</td>\n",
              "      <td>[On, WARNING;, Tand Reproductive Harm, Cancer, ca gov, Www p6Swarnings c, %7, otics]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25</th>\n",
              "      <td>1.672327e+12</td>\n",
              "      <td>[PhlAdelphia, chion, gov, %7, WARNING:, Harm, Reproductive [, Tand, Cancer, p6Swarningsca , Www]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>26</th>\n",
              "      <td>1.673569e+12</td>\n",
              "      <td>[Ohion, WARNING:, Cancer and Reproductive Harm, wwwp6Swarnings cagov, %7, phLAdELphIa, hve, 1]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>27</th>\n",
              "      <td>1.674810e+12</td>\n",
              "      <td>[P7, Har, Reproductre]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>28</th>\n",
              "      <td>1.676051e+12</td>\n",
              "      <td>[MhLAdELphia, %2]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>29</th>\n",
              "      <td>1.677293e+12</td>\n",
              "      <td>[Activeshart, Technology, 8]</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1205c51e-199c-48a4-9b4b-ee740ef8c192')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-1205c51e-199c-48a4-9b4b-ee740ef8c192 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-1205c51e-199c-48a4-9b4b-ee740ef8c192');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-60423242-be0d-4fd6-a2ac-494cfa5f5698\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-60423242-be0d-4fd6-a2ac-494cfa5f5698')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-60423242-be0d-4fd6-a2ac-494cfa5f5698 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "J0aUJiHj0DtZ"
      },
      "id": "J0aUJiHj0DtZ",
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "custom": {
      "cells": [],
      "metadata": {
        "fileHeader": "",
        "kernelspec": {
          "display_name": "Python 3 (ipykernel)",
          "language": "python",
          "name": "python3"
        },
        "language_info": {
          "codemirror_mode": {
            "name": "ipython",
            "version": 3
          },
          "file_extension": ".py",
          "mimetype": "text/x-python",
          "name": "python",
          "nbconvert_exporter": "python",
          "pygments_lexer": "ipython3",
          "version": "3.10.11"
        }
      },
      "nbformat": 4,
      "nbformat_minor": 5
    },
    "indentAmount": 2,
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.4"
    },
    "colab": {
      "provenance": [],
      "gpuType": "T4"
    },
    "accelerator": "GPU"
  },
  "nbformat": 4,
  "nbformat_minor": 5
}

================================================
FILE: sample_data/books.csv
================================================
book_name,img_url,id,date,time
I Am a Strange Loop,https://img1.od-cdn.com/ImageType-100/0887-1/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg,books_26,2019/04/19,04:37:00
I Am a Strange Loop,https://img1.od-cdn.com/ImageType-100/0887-1/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg,books_27,2019/04/17,04:50:00
War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_28,2019/04/15,03:54:00
War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_29,2019/04/05,19:23:00
War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_30,2019/04/05,16:26:00
The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_31,2019/03/28,15:06:00
The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_32,2019/03/23,05:02:00
War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_33,2019/03/14,16:26:00
Neuromancer,https://img1.od-cdn.com/ImageType-100/1191-1/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg,books_34,2019/03/14,16:22:00
Horse,https://img1.od-cdn.com/ImageType-100/1191-1/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg,books_35,2019/03/03,23:17:00
Exhalation,https://img1.od-cdn.com/ImageType-100/1191-1/{BBA7984E-7A18-4ADF-A1D5-FA1BE72BD5AA}Img100.jpg,books_36,2019/03/03,23:16:00
Neuromancer,https://img1.od-cdn.com/ImageType-100/1191-1/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg,books_37,2019/04/30,16:51:00
Exhalation,https://img1.od-cdn.com/ImageType-100/1191-1/{BBA7984E-7A18-4ADF-A1D5-FA1BE72BD5AA}Img100.jpg,books_38,2019/04/28,04:25:00
Give and Take,https://img1.od-cdn.com/ImageType-100/1191-1/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg,books_39,2019/04/27,13:27:00
Hallucinations,https://img1.od-cdn.com/ImageType-100/1191-1/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg,books_40,2019/04/18,11:22:00
Horse,https://img1.od-cdn.com/ImageType-100/1191-1/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg,books_41,2019/04/16,00:11:00
Think Again,https://img1.od-cdn.com/ImageType-100/1191-1/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg,books_42,2019/04/13,23:11:00
Give and Take,https://img1.od-cdn.com/ImageType-100/1191-1/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg,books_43,2019/04/10,02:24:00
Into the Magic Shop,https://img1.od-cdn.com/ImageType-100/0887-1/{3FD140F9-B7BE-42CF-AF9C-1D1E95503B30}Img100.jpg,books_44,2019/04/10,02:23:00
Good Anxiety,https://img1.od-cdn.com/ImageType-100/5054-1/{B38EC863-9C53-4816-B0A3-70960F5EE87E}Img100.jpg,books_45,2019/04/06,14:35:00
Hallucinations,https://img1.od-cdn.com/ImageType-100/1191-1/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg,books_46,2019/04/05,23:52:00
Think Again,https://img1.od-cdn.com/ImageType-100/1191-1/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg,books_47,2019/04/05,23:51:00
Neuromancer,https://img1.od-cdn.com/ImageType-100/1191-1/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg,books_48,2019/04/04,03:49:00
Into the Magic Shop,https://img1.od-cdn.com/ImageType-100/0887-1/{3FD140F9-B7BE-42CF-AF9C-1D1E95503B30}Img100.jpg,books_49,2019/04/03,20:17:00
"Healthy Brain, Happy Life",https://img1.od-cdn.com/ImageType-100/0293-1/{C964EFD7-0622-43C1-BFEC-42DB9C82A827}Img100.jpg,books_50,2019/04/03,04:23:00
Good Anxiety,https://img1.od-cdn.com/ImageType-100/5054-1/{B38EC863-9C53-4816-B0A3-70960F5EE87E}Img100.jpg,books_51,2019/04/03,04:23:00
Year of Wonders,https://img1.od-cdn.com/ImageType-100/0887-1/{C231E269-04B3-46AA-9410-49493BE8415E}Img100.jpg,books_52,2019/03/29,20:50:00
"Healthy Brain, Happy Life",https://img1.od-cdn.com/ImageType-100/0293-1/{C964EFD7-0622-43C1-BFEC-42DB9C82A827}Img100.jpg,books_53,2019/03/28,04:08:00
Year of Wonders,https://img1.od-cdn.com/ImageType-100/0887-1/{C231E269-04B3-46AA-9410-49493BE8415E}Img100.jpg,books_54,2019/03/24,20:24:00
Horse,https://img1.od-cdn.com/ImageType-100/1191-1/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg,books_55,2019/03/24,20:22:00
Hallucinations,https://img1.od-cdn.com/ImageType-100/1191-1/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg,books_56,2019/03/24,20:19:00
The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_57,2019/03/24,20:15:00
The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_58,2019/03/23,14:26:00
Originals,https://img1.od-cdn.com/ImageType-100/1191-1/{659784EB-40E0-4061-904D-17606BAD9980}Img100.jpg,books_59,2019/03/23,14:25:00
The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_60,2019/03/23,14:24:00
The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_61,2019/03/23,14:23:00
Originals,https://img1.od-cdn.com/ImageType-100/1191-1/{659784EB-40E0-4061-904D-17606BAD9980}Img100.jpg,books_62,2019/03/20,23:47:00
Give and Take,https://img1.od-cdn.com/ImageType-100/1191-1/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg,books_63,2019/03/20,23:46:00
Think Again,https://img1.od-cdn.com/ImageType-100/1191-1/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg,books_64,2019/03/20,23:46:00
Stories of Your Life and Others,https://img1.od-cdn.com/ImageType-100/1219-1/{E5634854-4A0C-4971-95FB-9C38A1A8C2DE}Img100.jpg,books_65,2019/03/20,00:21:00
"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",https://img1.od-cdn.com/ImageType-100/6852-1/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG,books_66,2019/03/20,00:20:00
Stories of Your Life and Others,https://img1.od-cdn.com/ImageType-100/1219-1/{E5634854-4A0C-4971-95FB-9C38A1A8C2DE}Img100.jpg,books_67,2019/03/16,22:10:00
Einstein,https://img1.od-cdn.com/ImageType-100/5054-1/{82F07E3E-0184-4FD8-AE8D-7C359F454586}Img100.jpg,books_68,2019/03/16,22:09:00
How to Win Friends & Influence People in the Digital Age,https://img1.od-cdn.com/ImageType-100/1294-1/{6FE70C4B-92E2-43A3-82CE-C44565A042A3}Img100.jpg,books_69,2019/03/11,15:40:00
The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_70,2019/03/11,00:07:00
Einstein,https://img1.od-cdn.com/ImageType-100/5054-1/{82F07E3E-0184-4FD8-AE8D-7C359F454586}Img100.jpg,books_71,2019/03/05,17:07:00
1491,https://img1.od-cdn.com/ImageType-100/1191-1/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg,books_72,2019/03/05,17:04:00
"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",https://img1.od-cdn.com/ImageType-100/6852-1/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG,books_73,2019/03/05,09:45:00
"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",https://img1.od-cdn.com/ImageType-100/6852-1/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG,books_74,2019/04/29,18:04:00
The Psychology of Money,https://img1.od-cdn.com/ImageType-100/6645-1/{6662B07C-039F-4CEA-A4AD-08F037CC2AE3}Img100.jpg,books_75,2019/04/27,03:18:00
1491,https://img1.od-cdn.com/ImageType-100/1191-1/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg,books_76,2019/04/26,00:41:00
The Boys,https://img1.od-cdn.com/ImageType-100/7552-1/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG,books_77,2019/04/25,14:39:00
The Boys,https://img1.od-cdn.com/ImageType-100/7552-1/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG,books_78,2019/04/22,17:47:00
How to Win Friends & Influence People in the Digital Age,https://img1.od-cdn.com/ImageType-100/1294-1/{6FE70C4B-92E2-43A3-82CE-C44565A042A3}Img100.jpg,books_79,2019/04/21,15:40:00
How the World Really Works,https://img1.od-cdn.com/ImageType-100/1191-1/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg,books_80,2019/04/20,22:06:00
From Strength to Strength,https://img1.od-cdn.com/ImageType-100/1191-1/{0D269BBE-2AEA-4557-8009-11ADDBEF9E10}Img100.jpg,books_81,2019/04/18,03:50:00
How the World Really Works,https://img1.od-cdn.com/ImageType-100/1191-1/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg,books_82,2019/04/14,17:30:00
1491,https://img1.od-cdn.com/ImageType-100/1191-1/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg,books_83,2019/04/07,23:17:00
The Boys,https://img1.od-cdn.com/ImageType-100/7552-1/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG,books_84,2019/04/03,20:41:00
The Devil's Star,https://img1.od-cdn.com/ImageType-100/1191-1/{082BCD72-EA9B-450E-9784-F978EC90DE73}Img100.jpg,books_85,2019/03/28,17:36:00
The Bitcoin Standard,https://img1.od-cdn.com/ImageType-100/0128-1/{0A84F8C2-77CC-4E4D-B2AA-002F7FC1C3F7}Img100.jpg,books_86,2019/03/24,22:41:00
The Devil's Star,https://img1.od-cdn.com/ImageType-100/1191-1/{082BCD72-EA9B-450E-9784-F978EC90DE73}Img100.jpg,books_87,2019/03/07,17:36:00
Mating in Captivity,https://img1.od-cdn.com/ImageType-100/0293-1/{09F22BD0-4235-4A59-A677-82A0AF31CB3F}Img100.jpg,books_88,2019/03/06,03:51:00
How the World Really Works,https://img1.od-cdn.com/ImageType-100/1191-1/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg,books_89,2019/03/06,03:05:00
The Lincoln Highway,https://img1.od-cdn.com/ImageType-100/1191-1/{C1399BF6-5678-4BCA-8520-1DEE2527E29A}Img100.jpg,books_90,2019/03/06,03:04:00
The Three-Body Problem,https://img1.od-cdn.com/ImageType-100/1493-1/{E865FBC4-6E05-4F87-B780-728E64158A2C}Img100.jpg,books_91,2019/03/04,03:07:00
Mating in Captivity,https://img1.od-cdn.com/ImageType-100/0293-1/{09F22BD0-4235-4A59-A677-82A0AF31CB3F}Img100.jpg,books_92,2019/03/04,03:07:00
The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_93,2019/03/04,03:06:00
The Bitcoin Standard,https://img1.od-cdn.com/ImageType-100/0128-1/{0A84F8C2-77CC-4E4D-B2AA-002F7FC1C3F7}Img100.jpg,books_94,2019/03/03,22:41:00
The Three-Body Problem,https://img1.od-cdn.com/ImageType-100/1493-1/{E865FBC4-6E05-4F87-B780-728E64158A2C}Img100.jpg,books_95,2019/04/21,19:17:00
The Ministry for the Future,https://img1.od-cdn.com/ImageType-100/0887-1/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg,books_96,2019/04/05,18:59:00
Benjamin Franklin,https://img1.od-cdn.com/ImageType-100/5054-1/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg,books_97,2019/03/28,21:42:00
The Curious Case of Benjamin Button and Other Jazz Age Tales,https://img1.od-cdn.com/ImageType-100/1219-1/{146935CF-D64E-46E9-B63D-CFD6182F1BE4}Img100.jpg,books_98,2019/03/18,23:42:00
The House of the Spirits,https://img1.od-cdn.com/ImageType-100/5054-1/{BC0222FF-419B-4FB4-B844-CBC6128B6E7D}Img100.jpg,books_99,2019/03/18,23:41:00
The Ministry for the Future,https://img1.od-cdn.com/ImageType-100/0887-1/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg,books_100,2019/03/15,18:59:00
The Body,https://img1.od-cdn.com/ImageType-100/1191-1/{38DDDC42-8411-4192-96F7-56BE613A6F27}Img100.jpg,books_101,2019/03/08,17:18:00
Benjamin Franklin,https://img1.od-cdn.com/ImageType-100/5054-1/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg,books_102,2019/03/07,21:42:00
The Curious Case of Benjamin Button and Other Jazz Age Tales,https://img1.od-cdn.com/ImageType-100/1219-1/{146935CF-D64E-46E9-B63D-CFD6182F1BE4}Img100.jpg,books_103,2019/04/27,23:42:00
The House of the Spirits,https://img1.od-cdn.com/ImageType-100/5054-1/{BC0222FF-419B-4FB4-B844-CBC6128B6E7D}Img100.jpg,books_104,2019/04/27,23:41:00
The Ministry for the Future,https://img1.od-cdn.com/ImageType-100/0887-1/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg,books_105,2019/04/24,22:57:00
Leonardo da Vinci,https://img1.od-cdn.com/ImageType-100/5054-1/{8BD6950E-7348-44E2-8EB8-C40021DED5CE}Img100.jpg,books_106,2019/04/24,18:43:00
The Body,https://img1.od-cdn.com/ImageType-100/1191-1/{38DDDC42-8411-4192-96F7-56BE613A6F27}Img100.jpg,books_107,2019/04/17,17:18:00
Benjamin Franklin,https://img1.od-cdn.com/ImageType-100/5054-1/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg,books_108,2019/04/17,17:16:00
Leonardo da Vinci,https://img1.od-cdn.com/ImageType-100/5054-1/{8BD6950E-7348-44E2-8EB8-C40021DED5CE}Img100.jpg,books_109,2019/04/03,18:43:00
A Thousand Brains: A New Theory of Intelligence,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B08VWV2WDK&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_111,2019/04/28,16:51:51
50 Greatest Positive Psychology Quotes: A Beautiful Photo Book of The Most Inspiring Positive Psychological Quotes,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B08JFTX8D1&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_112,2019/04/20,16:51:51
Before the Coffee Gets Cold: A Novel (Before the Coffee Gets Cold Series Book 1),https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B084B6VFHG&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_113,2019/03/27,16:51:51
The Enigma of Reason,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B06XWFM3PP&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_114,2019/03/24,16:51:51
Snow Crash: A Novel,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B000FBJCJE&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_115,2019/03/12,16:51:51
The Innovators,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B00M9KA2ZM&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_267,2019/03/08,16:51:51
Life 3.0,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B0742K1G4Q&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_268,2019/03/08,16:51:51
Total Recall,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B003DKVAB2&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_269,2019/03/17,16:51:51
Discovery Trial,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B0821SC3ZP&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_270,2019/04/28,16:51:51


================================================
FILE: sample_data/books.sampled.csv
================================================
,Unnamed: 0,time,book_name,img_url,id
0,0,2019-04-19T04:37:00,I Am a Strange Loop,https://img1.od-cdn.com/ImageType-100/0887-1/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg,books_26
1,1,2019-04-17T04:50:00,I Am a Strange Loop,https://img1.od-cdn.com/ImageType-100/0887-1/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg,books_27
2,2,2019-04-15T03:54:00,War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_28
3,3,2019-04-05T19:23:00,War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_29
4,4,2019-04-05T16:26:00,War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_30
5,5,2019-03-28T15:06:00,The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_31
6,6,2019-03-23T05:02:00,The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_32
7,7,2019-03-14T16:26:00,War and Peace,https://img1.od-cdn.com/ImageType-100/0887-1/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG,books_33
8,8,2019-03-14T16:22:00,Neuromancer,https://img1.od-cdn.com/ImageType-100/1191-1/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg,books_34
9,9,2019-03-03T23:17:00,Horse,https://img1.od-cdn.com/ImageType-100/1191-1/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg,books_35
10,10,2019-03-03T23:16:00,Exhalation,https://img1.od-cdn.com/ImageType-100/1191-1/{BBA7984E-7A18-4ADF-A1D5-FA1BE72BD5AA}Img100.jpg,books_36
11,11,2019-04-30T16:51:00,Neuromancer,https://img1.od-cdn.com/ImageType-100/1191-1/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg,books_37
12,12,2019-04-28T04:25:00,Exhalation,https://img1.od-cdn.com/ImageType-100/1191-1/{BBA7984E-7A18-4ADF-A1D5-FA1BE72BD5AA}Img100.jpg,books_38
13,13,2019-04-27T13:27:00,Give and Take,https://img1.od-cdn.com/ImageType-100/1191-1/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg,books_39
14,14,2019-04-18T11:22:00,Hallucinations,https://img1.od-cdn.com/ImageType-100/1191-1/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg,books_40
15,15,2019-04-16T00:11:00,Horse,https://img1.od-cdn.com/ImageType-100/1191-1/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg,books_41
16,16,2019-04-13T23:11:00,Think Again,https://img1.od-cdn.com/ImageType-100/1191-1/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg,books_42
17,17,2019-04-10T02:24:00,Give and Take,https://img1.od-cdn.com/ImageType-100/1191-1/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg,books_43
18,18,2019-04-10T02:23:00,Into the Magic Shop,https://img1.od-cdn.com/ImageType-100/0887-1/{3FD140F9-B7BE-42CF-AF9C-1D1E95503B30}Img100.jpg,books_44
19,19,2019-04-06T14:35:00,Good Anxiety,https://img1.od-cdn.com/ImageType-100/5054-1/{B38EC863-9C53-4816-B0A3-70960F5EE87E}Img100.jpg,books_45
20,20,2019-04-05T23:52:00,Hallucinations,https://img1.od-cdn.com/ImageType-100/1191-1/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg,books_46
21,21,2019-04-05T23:51:00,Think Again,https://img1.od-cdn.com/ImageType-100/1191-1/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg,books_47
22,22,2019-04-04T03:49:00,Neuromancer,https://img1.od-cdn.com/ImageType-100/1191-1/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg,books_48
23,23,2019-04-03T20:17:00,Into the Magic Shop,https://img1.od-cdn.com/ImageType-100/0887-1/{3FD140F9-B7BE-42CF-AF9C-1D1E95503B30}Img100.jpg,books_49
24,24,2019-04-03T04:23:00,"Healthy Brain, Happy Life",https://img1.od-cdn.com/ImageType-100/0293-1/{C964EFD7-0622-43C1-BFEC-42DB9C82A827}Img100.jpg,books_50
25,25,2019-04-03T04:23:00,Good Anxiety,https://img1.od-cdn.com/ImageType-100/5054-1/{B38EC863-9C53-4816-B0A3-70960F5EE87E}Img100.jpg,books_51
26,26,2019-03-29T20:50:00,Year of Wonders,https://img1.od-cdn.com/ImageType-100/0887-1/{C231E269-04B3-46AA-9410-49493BE8415E}Img100.jpg,books_52
27,27,2019-03-28T04:08:00,"Healthy Brain, Happy Life",https://img1.od-cdn.com/ImageType-100/0293-1/{C964EFD7-0622-43C1-BFEC-42DB9C82A827}Img100.jpg,books_53
28,28,2019-03-24T20:24:00,Year of Wonders,https://img1.od-cdn.com/ImageType-100/0887-1/{C231E269-04B3-46AA-9410-49493BE8415E}Img100.jpg,books_54
29,29,2019-03-24T20:22:00,Horse,https://img1.od-cdn.com/ImageType-100/1191-1/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg,books_55
30,30,2019-03-24T20:19:00,Hallucinations,https://img1.od-cdn.com/ImageType-100/1191-1/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg,books_56
31,31,2019-03-24T20:15:00,The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_57
32,32,2019-03-23T14:26:00,The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_58
33,33,2019-03-23T14:25:00,Originals,https://img1.od-cdn.com/ImageType-100/1191-1/{659784EB-40E0-4061-904D-17606BAD9980}Img100.jpg,books_59
34,34,2019-03-23T14:24:00,The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_60
35,35,2019-03-23T14:23:00,The River of Consciousness,https://img1.od-cdn.com/ImageType-100/1191-1/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg,books_61
36,36,2019-03-20T23:47:00,Originals,https://img1.od-cdn.com/ImageType-100/1191-1/{659784EB-40E0-4061-904D-17606BAD9980}Img100.jpg,books_62
37,37,2019-03-20T23:46:00,Give and Take,https://img1.od-cdn.com/ImageType-100/1191-1/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg,books_63
38,38,2019-03-20T23:46:00,Think Again,https://img1.od-cdn.com/ImageType-100/1191-1/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg,books_64
39,39,2019-03-20T00:21:00,Stories of Your Life and Others,https://img1.od-cdn.com/ImageType-100/1219-1/{E5634854-4A0C-4971-95FB-9C38A1A8C2DE}Img100.jpg,books_65
40,40,2019-03-20T00:20:00,"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",https://img1.od-cdn.com/ImageType-100/6852-1/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG,books_66
41,41,2019-03-16T22:10:00,Stories of Your Life and Others,https://img1.od-cdn.com/ImageType-100/1219-1/{E5634854-4A0C-4971-95FB-9C38A1A8C2DE}Img100.jpg,books_67
42,42,2019-03-16T22:09:00,Einstein,https://img1.od-cdn.com/ImageType-100/5054-1/{82F07E3E-0184-4FD8-AE8D-7C359F454586}Img100.jpg,books_68
43,43,2019-03-11T15:40:00,How to Win Friends & Influence People in the Digital Age,https://img1.od-cdn.com/ImageType-100/1294-1/{6FE70C4B-92E2-43A3-82CE-C44565A042A3}Img100.jpg,books_69
44,44,2019-03-11T00:07:00,The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_70
45,45,2019-03-05T17:07:00,Einstein,https://img1.od-cdn.com/ImageType-100/5054-1/{82F07E3E-0184-4FD8-AE8D-7C359F454586}Img100.jpg,books_71
46,46,2019-03-05T17:04:00,1491,https://img1.od-cdn.com/ImageType-100/1191-1/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg,books_72
47,47,2019-03-05T09:45:00,"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",https://img1.od-cdn.com/ImageType-100/6852-1/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG,books_73
48,48,2019-04-29T18:04:00,"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",https://img1.od-cdn.com/ImageType-100/6852-1/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG,books_74
49,49,2019-04-27T03:18:00,The Psychology of Money,https://img1.od-cdn.com/ImageType-100/6645-1/{6662B07C-039F-4CEA-A4AD-08F037CC2AE3}Img100.jpg,books_75
50,50,2019-04-26T00:41:00,1491,https://img1.od-cdn.com/ImageType-100/1191-1/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg,books_76
51,51,2019-04-25T14:39:00,The Boys,https://img1.od-cdn.com/ImageType-100/7552-1/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG,books_77
52,52,2019-04-22T17:47:00,The Boys,https://img1.od-cdn.com/ImageType-100/7552-1/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG,books_78
53,53,2019-04-21T15:40:00,How to Win Friends & Influence People in the Digital Age,https://img1.od-cdn.com/ImageType-100/1294-1/{6FE70C4B-92E2-43A3-82CE-C44565A042A3}Img100.jpg,books_79
54,54,2019-04-20T22:06:00,How the World Really Works,https://img1.od-cdn.com/ImageType-100/1191-1/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg,books_80
55,55,2019-04-18T03:50:00,From Strength to Strength,https://img1.od-cdn.com/ImageType-100/1191-1/{0D269BBE-2AEA-4557-8009-11ADDBEF9E10}Img100.jpg,books_81
56,56,2019-04-14T17:30:00,How the World Really Works,https://img1.od-cdn.com/ImageType-100/1191-1/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg,books_82
57,57,2019-04-07T23:17:00,1491,https://img1.od-cdn.com/ImageType-100/1191-1/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg,books_83
58,58,2019-04-03T20:41:00,The Boys,https://img1.od-cdn.com/ImageType-100/7552-1/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG,books_84
59,59,2019-03-28T17:36:00,The Devil's Star,https://img1.od-cdn.com/ImageType-100/1191-1/{082BCD72-EA9B-450E-9784-F978EC90DE73}Img100.jpg,books_85
60,60,2019-03-24T22:41:00,The Bitcoin Standard,https://img1.od-cdn.com/ImageType-100/0128-1/{0A84F8C2-77CC-4E4D-B2AA-002F7FC1C3F7}Img100.jpg,books_86
61,61,2019-03-07T17:36:00,The Devil's Star,https://img1.od-cdn.com/ImageType-100/1191-1/{082BCD72-EA9B-450E-9784-F978EC90DE73}Img100.jpg,books_87
62,62,2019-03-06T03:51:00,Mating in Captivity,https://img1.od-cdn.com/ImageType-100/0293-1/{09F22BD0-4235-4A59-A677-82A0AF31CB3F}Img100.jpg,books_88
63,63,2019-03-06T03:05:00,How the World Really Works,https://img1.od-cdn.com/ImageType-100/1191-1/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg,books_89
64,64,2019-03-06T03:04:00,The Lincoln Highway,https://img1.od-cdn.com/ImageType-100/1191-1/{C1399BF6-5678-4BCA-8520-1DEE2527E29A}Img100.jpg,books_90
65,65,2019-03-04T03:07:00,The Three-Body Problem,https://img1.od-cdn.com/ImageType-100/1493-1/{E865FBC4-6E05-4F87-B780-728E64158A2C}Img100.jpg,books_91
66,66,2019-03-04T03:07:00,Mating in Captivity,https://img1.od-cdn.com/ImageType-100/0293-1/{09F22BD0-4235-4A59-A677-82A0AF31CB3F}Img100.jpg,books_92
67,67,2019-03-04T03:06:00,The Culture Map,https://img1.od-cdn.com/ImageType-100/1088-1/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg,books_93
68,68,2019-03-03T22:41:00,The Bitcoin Standard,https://img1.od-cdn.com/ImageType-100/0128-1/{0A84F8C2-77CC-4E4D-B2AA-002F7FC1C3F7}Img100.jpg,books_94
69,69,2019-04-21T19:17:00,The Three-Body Problem,https://img1.od-cdn.com/ImageType-100/1493-1/{E865FBC4-6E05-4F87-B780-728E64158A2C}Img100.jpg,books_95
70,70,2019-04-05T18:59:00,The Ministry for the Future,https://img1.od-cdn.com/ImageType-100/0887-1/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg,books_96
71,71,2019-03-28T21:42:00,Benjamin Franklin,https://img1.od-cdn.com/ImageType-100/5054-1/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg,books_97
72,72,2019-03-18T23:42:00,The Curious Case of Benjamin Button and Other Jazz Age Tales,https://img1.od-cdn.com/ImageType-100/1219-1/{146935CF-D64E-46E9-B63D-CFD6182F1BE4}Img100.jpg,books_98
73,73,2019-03-18T23:41:00,The House of the Spirits,https://img1.od-cdn.com/ImageType-100/5054-1/{BC0222FF-419B-4FB4-B844-CBC6128B6E7D}Img100.jpg,books_99
74,74,2019-03-15T18:59:00,The Ministry for the Future,https://img1.od-cdn.com/ImageType-100/0887-1/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg,books_100
75,75,2019-03-08T17:18:00,The Body,https://img1.od-cdn.com/ImageType-100/1191-1/{38DDDC42-8411-4192-96F7-56BE613A6F27}Img100.jpg,books_101
76,76,2019-03-07T21:42:00,Benjamin Franklin,https://img1.od-cdn.com/ImageType-100/5054-1/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg,books_102
77,77,2019-04-27T23:42:00,The Curious Case of Benjamin Button and Other Jazz Age Tales,https://img1.od-cdn.com/ImageType-100/1219-1/{146935CF-D64E-46E9-B63D-CFD6182F1BE4}Img100.jpg,books_103
78,78,2019-04-27T23:41:00,The House of the Spirits,https://img1.od-cdn.com/ImageType-100/5054-1/{BC0222FF-419B-4FB4-B844-CBC6128B6E7D}Img100.jpg,books_104
79,79,2019-04-24T22:57:00,The Ministry for the Future,https://img1.od-cdn.com/ImageType-100/0887-1/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg,books_105
80,80,2019-04-24T18:43:00,Leonardo da Vinci,https://img1.od-cdn.com/ImageType-100/5054-1/{8BD6950E-7348-44E2-8EB8-C40021DED5CE}Img100.jpg,books_106
81,81,2019-04-17T17:18:00,The Body,https://img1.od-cdn.com/ImageType-100/1191-1/{38DDDC42-8411-4192-96F7-56BE613A6F27}Img100.jpg,books_107
82,82,2019-04-17T17:16:00,Benjamin Franklin,https://img1.od-cdn.com/ImageType-100/5054-1/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg,books_108
83,83,2019-04-03T18:43:00,Leonardo da Vinci,https://img1.od-cdn.com/ImageType-100/5054-1/{8BD6950E-7348-44E2-8EB8-C40021DED5CE}Img100.jpg,books_109
84,84,2019-04-28T16:51:51,A Thousand Brains: A New Theory of Intelligence,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B08VWV2WDK&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_111
85,85,2019-04-20T16:51:51,50 Greatest Positive Psychology Quotes: A Beautiful Photo Book of The Most Inspiring Positive Psychological Quotes,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B08JFTX8D1&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_112
86,86,2019-03-27T16:51:51,Before the Coffee Gets Cold: A Novel (Before the Coffee Gets Cold Series Book 1),https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B084B6VFHG&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_113
87,87,2019-03-24T16:51:51,The Enigma of Reason,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B06XWFM3PP&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_114
88,88,2019-03-12T16:51:51,Snow Crash: A Novel,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B000FBJCJE&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_115
89,89,2019-03-08T16:51:51,The Innovators,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B00M9KA2ZM&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_267
90,90,2019-03-08T16:51:51,Life 3.0,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B0742K1G4Q&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_268
91,91,2019-03-17T16:51:51,Total Recall,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B003DKVAB2&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_269
92,92,2019-04-28T16:51:51,Discovery Trial,https://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=B0821SC3ZP&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=SL250,books_270


================================================
FILE: sample_data/books.sampled.json
================================================
[
  {
    "Unnamed: 0":0,
    "time":"2019-04-19T04:37:00",
    "book_name":"I Am a Strange Loop",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg",
    "id":"books_26"
  },
  {
    "Unnamed: 0":1,
    "time":"2019-04-17T04:50:00",
    "book_name":"I Am a Strange Loop",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{A6AA81F6-9242-4793-8AB0-A5C8B5DBDB66}Img100.jpg",
    "id":"books_27"
  },
  {
    "Unnamed: 0":2,
    "time":"2019-04-15T03:54:00",
    "book_name":"War and Peace",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG",
    "id":"books_28"
  },
  {
    "Unnamed: 0":3,
    "time":"2019-04-05T19:23:00",
    "book_name":"War and Peace",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG",
    "id":"books_29"
  },
  {
    "Unnamed: 0":4,
    "time":"2019-04-05T16:26:00",
    "book_name":"War and Peace",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG",
    "id":"books_30"
  },
  {
    "Unnamed: 0":5,
    "time":"2019-03-28T15:06:00",
    "book_name":"The Culture Map",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1088-1\/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg",
    "id":"books_31"
  },
  {
    "Unnamed: 0":6,
    "time":"2019-03-23T05:02:00",
    "book_name":"The Culture Map",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1088-1\/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg",
    "id":"books_32"
  },
  {
    "Unnamed: 0":7,
    "time":"2019-03-14T16:26:00",
    "book_name":"War and Peace",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{97ABE6CD-7D83-475B-8278-210FB71B35DC}IMG100.JPG",
    "id":"books_33"
  },
  {
    "Unnamed: 0":8,
    "time":"2019-03-14T16:22:00",
    "book_name":"Neuromancer",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg",
    "id":"books_34"
  },
  {
    "Unnamed: 0":9,
    "time":"2019-03-03T23:17:00",
    "book_name":"Horse",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg",
    "id":"books_35"
  },
  {
    "Unnamed: 0":10,
    "time":"2019-03-03T23:16:00",
    "book_name":"Exhalation",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{BBA7984E-7A18-4ADF-A1D5-FA1BE72BD5AA}Img100.jpg",
    "id":"books_36"
  },
  {
    "Unnamed: 0":11,
    "time":"2019-04-30T16:51:00",
    "book_name":"Neuromancer",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg",
    "id":"books_37"
  },
  {
    "Unnamed: 0":12,
    "time":"2019-04-28T04:25:00",
    "book_name":"Exhalation",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{BBA7984E-7A18-4ADF-A1D5-FA1BE72BD5AA}Img100.jpg",
    "id":"books_38"
  },
  {
    "Unnamed: 0":13,
    "time":"2019-04-27T13:27:00",
    "book_name":"Give and Take",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg",
    "id":"books_39"
  },
  {
    "Unnamed: 0":14,
    "time":"2019-04-18T11:22:00",
    "book_name":"Hallucinations",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg",
    "id":"books_40"
  },
  {
    "Unnamed: 0":15,
    "time":"2019-04-16T00:11:00",
    "book_name":"Horse",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg",
    "id":"books_41"
  },
  {
    "Unnamed: 0":16,
    "time":"2019-04-13T23:11:00",
    "book_name":"Think Again",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg",
    "id":"books_42"
  },
  {
    "Unnamed: 0":17,
    "time":"2019-04-10T02:24:00",
    "book_name":"Give and Take",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg",
    "id":"books_43"
  },
  {
    "Unnamed: 0":18,
    "time":"2019-04-10T02:23:00",
    "book_name":"Into the Magic Shop",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{3FD140F9-B7BE-42CF-AF9C-1D1E95503B30}Img100.jpg",
    "id":"books_44"
  },
  {
    "Unnamed: 0":19,
    "time":"2019-04-06T14:35:00",
    "book_name":"Good Anxiety",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{B38EC863-9C53-4816-B0A3-70960F5EE87E}Img100.jpg",
    "id":"books_45"
  },
  {
    "Unnamed: 0":20,
    "time":"2019-04-05T23:52:00",
    "book_name":"Hallucinations",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg",
    "id":"books_46"
  },
  {
    "Unnamed: 0":21,
    "time":"2019-04-05T23:51:00",
    "book_name":"Think Again",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg",
    "id":"books_47"
  },
  {
    "Unnamed: 0":22,
    "time":"2019-04-04T03:49:00",
    "book_name":"Neuromancer",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{7BA87BED-B3A1-4AD0-998A-CC84D7F542DA}Img100.jpg",
    "id":"books_48"
  },
  {
    "Unnamed: 0":23,
    "time":"2019-04-03T20:17:00",
    "book_name":"Into the Magic Shop",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{3FD140F9-B7BE-42CF-AF9C-1D1E95503B30}Img100.jpg",
    "id":"books_49"
  },
  {
    "Unnamed: 0":24,
    "time":"2019-04-03T04:23:00",
    "book_name":"Healthy Brain, Happy Life",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0293-1\/{C964EFD7-0622-43C1-BFEC-42DB9C82A827}Img100.jpg",
    "id":"books_50"
  },
  {
    "Unnamed: 0":25,
    "time":"2019-04-03T04:23:00",
    "book_name":"Good Anxiety",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{B38EC863-9C53-4816-B0A3-70960F5EE87E}Img100.jpg",
    "id":"books_51"
  },
  {
    "Unnamed: 0":26,
    "time":"2019-03-29T20:50:00",
    "book_name":"Year of Wonders",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{C231E269-04B3-46AA-9410-49493BE8415E}Img100.jpg",
    "id":"books_52"
  },
  {
    "Unnamed: 0":27,
    "time":"2019-03-28T04:08:00",
    "book_name":"Healthy Brain, Happy Life",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0293-1\/{C964EFD7-0622-43C1-BFEC-42DB9C82A827}Img100.jpg",
    "id":"books_53"
  },
  {
    "Unnamed: 0":28,
    "time":"2019-03-24T20:24:00",
    "book_name":"Year of Wonders",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{C231E269-04B3-46AA-9410-49493BE8415E}Img100.jpg",
    "id":"books_54"
  },
  {
    "Unnamed: 0":29,
    "time":"2019-03-24T20:22:00",
    "book_name":"Horse",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{C907B706-54A2-42AA-B199-16923A150BC4}Img100.jpg",
    "id":"books_55"
  },
  {
    "Unnamed: 0":30,
    "time":"2019-03-24T20:19:00",
    "book_name":"Hallucinations",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{84AE575E-576F-4590-A6E6-8460DE966A95}Img100.jpg",
    "id":"books_56"
  },
  {
    "Unnamed: 0":31,
    "time":"2019-03-24T20:15:00",
    "book_name":"The River of Consciousness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg",
    "id":"books_57"
  },
  {
    "Unnamed: 0":32,
    "time":"2019-03-23T14:26:00",
    "book_name":"The River of Consciousness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg",
    "id":"books_58"
  },
  {
    "Unnamed: 0":33,
    "time":"2019-03-23T14:25:00",
    "book_name":"Originals",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{659784EB-40E0-4061-904D-17606BAD9980}Img100.jpg",
    "id":"books_59"
  },
  {
    "Unnamed: 0":34,
    "time":"2019-03-23T14:24:00",
    "book_name":"The River of Consciousness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg",
    "id":"books_60"
  },
  {
    "Unnamed: 0":35,
    "time":"2019-03-23T14:23:00",
    "book_name":"The River of Consciousness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{0FA8ED18-A311-4DDB-A67C-2EAAB5AD90EE}Img100.jpg",
    "id":"books_61"
  },
  {
    "Unnamed: 0":36,
    "time":"2019-03-20T23:47:00",
    "book_name":"Originals",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{659784EB-40E0-4061-904D-17606BAD9980}Img100.jpg",
    "id":"books_62"
  },
  {
    "Unnamed: 0":37,
    "time":"2019-03-20T23:46:00",
    "book_name":"Give and Take",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{D4B1BE9F-E613-4495-9EC6-4380FBBCDC91}Img100.jpg",
    "id":"books_63"
  },
  {
    "Unnamed: 0":38,
    "time":"2019-03-20T23:46:00",
    "book_name":"Think Again",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{E53B8C11-8116-46D4-9930-66B3A1F970EB}Img100.jpg",
    "id":"books_64"
  },
  {
    "Unnamed: 0":39,
    "time":"2019-03-20T00:21:00",
    "book_name":"Stories of Your Life and Others",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1219-1\/{E5634854-4A0C-4971-95FB-9C38A1A8C2DE}Img100.jpg",
    "id":"books_65"
  },
  {
    "Unnamed: 0":40,
    "time":"2019-03-20T00:20:00",
    "book_name":"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/6852-1\/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG",
    "id":"books_66"
  },
  {
    "Unnamed: 0":41,
    "time":"2019-03-16T22:10:00",
    "book_name":"Stories of Your Life and Others",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1219-1\/{E5634854-4A0C-4971-95FB-9C38A1A8C2DE}Img100.jpg",
    "id":"books_67"
  },
  {
    "Unnamed: 0":42,
    "time":"2019-03-16T22:09:00",
    "book_name":"Einstein",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{82F07E3E-0184-4FD8-AE8D-7C359F454586}Img100.jpg",
    "id":"books_68"
  },
  {
    "Unnamed: 0":43,
    "time":"2019-03-11T15:40:00",
    "book_name":"How to Win Friends & Influence People in the Digital Age",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1294-1\/{6FE70C4B-92E2-43A3-82CE-C44565A042A3}Img100.jpg",
    "id":"books_69"
  },
  {
    "Unnamed: 0":44,
    "time":"2019-03-11T00:07:00",
    "book_name":"The Culture Map",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1088-1\/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg",
    "id":"books_70"
  },
  {
    "Unnamed: 0":45,
    "time":"2019-03-05T17:07:00",
    "book_name":"Einstein",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{82F07E3E-0184-4FD8-AE8D-7C359F454586}Img100.jpg",
    "id":"books_71"
  },
  {
    "Unnamed: 0":46,
    "time":"2019-03-05T17:04:00",
    "book_name":"1491",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg",
    "id":"books_72"
  },
  {
    "Unnamed: 0":47,
    "time":"2019-03-05T09:45:00",
    "book_name":"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/6852-1\/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG",
    "id":"books_73"
  },
  {
    "Unnamed: 0":48,
    "time":"2019-04-29T18:04:00",
    "book_name":"The Psychology of Money: Timeless lessons on wealth, greed, and happiness",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/6852-1\/{B1E07D01-EFD7-4767-88E7-5DE22657FE51}IMG100.JPG",
    "id":"books_74"
  },
  {
    "Unnamed: 0":49,
    "time":"2019-04-27T03:18:00",
    "book_name":"The Psychology of Money",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/6645-1\/{6662B07C-039F-4CEA-A4AD-08F037CC2AE3}Img100.jpg",
    "id":"books_75"
  },
  {
    "Unnamed: 0":50,
    "time":"2019-04-26T00:41:00",
    "book_name":"1491",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg",
    "id":"books_76"
  },
  {
    "Unnamed: 0":51,
    "time":"2019-04-25T14:39:00",
    "book_name":"The Boys",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/7552-1\/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG",
    "id":"books_77"
  },
  {
    "Unnamed: 0":52,
    "time":"2019-04-22T17:47:00",
    "book_name":"The Boys",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/7552-1\/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG",
    "id":"books_78"
  },
  {
    "Unnamed: 0":53,
    "time":"2019-04-21T15:40:00",
    "book_name":"How to Win Friends & Influence People in the Digital Age",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1294-1\/{6FE70C4B-92E2-43A3-82CE-C44565A042A3}Img100.jpg",
    "id":"books_79"
  },
  {
    "Unnamed: 0":54,
    "time":"2019-04-20T22:06:00",
    "book_name":"How the World Really Works",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg",
    "id":"books_80"
  },
  {
    "Unnamed: 0":55,
    "time":"2019-04-18T03:50:00",
    "book_name":"From Strength to Strength",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{0D269BBE-2AEA-4557-8009-11ADDBEF9E10}Img100.jpg",
    "id":"books_81"
  },
  {
    "Unnamed: 0":56,
    "time":"2019-04-14T17:30:00",
    "book_name":"How the World Really Works",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg",
    "id":"books_82"
  },
  {
    "Unnamed: 0":57,
    "time":"2019-04-07T23:17:00",
    "book_name":"1491",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{A7EC6CC4-107C-4DDE-BC7E-516A916C462C}Img100.jpg",
    "id":"books_83"
  },
  {
    "Unnamed: 0":58,
    "time":"2019-04-03T20:41:00",
    "book_name":"The Boys",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/7552-1\/{C1FC45F2-9E2D-4FA8-9EBE-72DF14121401}IMG100.JPG",
    "id":"books_84"
  },
  {
    "Unnamed: 0":59,
    "time":"2019-03-28T17:36:00",
    "book_name":"The Devil's Star",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{082BCD72-EA9B-450E-9784-F978EC90DE73}Img100.jpg",
    "id":"books_85"
  },
  {
    "Unnamed: 0":60,
    "time":"2019-03-24T22:41:00",
    "book_name":"The Bitcoin Standard",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0128-1\/{0A84F8C2-77CC-4E4D-B2AA-002F7FC1C3F7}Img100.jpg",
    "id":"books_86"
  },
  {
    "Unnamed: 0":61,
    "time":"2019-03-07T17:36:00",
    "book_name":"The Devil's Star",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{082BCD72-EA9B-450E-9784-F978EC90DE73}Img100.jpg",
    "id":"books_87"
  },
  {
    "Unnamed: 0":62,
    "time":"2019-03-06T03:51:00",
    "book_name":"Mating in Captivity",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0293-1\/{09F22BD0-4235-4A59-A677-82A0AF31CB3F}Img100.jpg",
    "id":"books_88"
  },
  {
    "Unnamed: 0":63,
    "time":"2019-03-06T03:05:00",
    "book_name":"How the World Really Works",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{2F5C2C49-A53D-4B42-8DCF-1AD588DF5476}Img100.jpg",
    "id":"books_89"
  },
  {
    "Unnamed: 0":64,
    "time":"2019-03-06T03:04:00",
    "book_name":"The Lincoln Highway",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{C1399BF6-5678-4BCA-8520-1DEE2527E29A}Img100.jpg",
    "id":"books_90"
  },
  {
    "Unnamed: 0":65,
    "time":"2019-03-04T03:07:00",
    "book_name":"The Three-Body Problem",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1493-1\/{E865FBC4-6E05-4F87-B780-728E64158A2C}Img100.jpg",
    "id":"books_91"
  },
  {
    "Unnamed: 0":66,
    "time":"2019-03-04T03:07:00",
    "book_name":"Mating in Captivity",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0293-1\/{09F22BD0-4235-4A59-A677-82A0AF31CB3F}Img100.jpg",
    "id":"books_92"
  },
  {
    "Unnamed: 0":67,
    "time":"2019-03-04T03:06:00",
    "book_name":"The Culture Map",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1088-1\/{19D66D38-7086-4689-A5CE-A63649FEEF96}Img100.jpg",
    "id":"books_93"
  },
  {
    "Unnamed: 0":68,
    "time":"2019-03-03T22:41:00",
    "book_name":"The Bitcoin Standard",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0128-1\/{0A84F8C2-77CC-4E4D-B2AA-002F7FC1C3F7}Img100.jpg",
    "id":"books_94"
  },
  {
    "Unnamed: 0":69,
    "time":"2019-04-21T19:17:00",
    "book_name":"The Three-Body Problem",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1493-1\/{E865FBC4-6E05-4F87-B780-728E64158A2C}Img100.jpg",
    "id":"books_95"
  },
  {
    "Unnamed: 0":70,
    "time":"2019-04-05T18:59:00",
    "book_name":"The Ministry for the Future",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg",
    "id":"books_96"
  },
  {
    "Unnamed: 0":71,
    "time":"2019-03-28T21:42:00",
    "book_name":"Benjamin Franklin",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg",
    "id":"books_97"
  },
  {
    "Unnamed: 0":72,
    "time":"2019-03-18T23:42:00",
    "book_name":"The Curious Case of Benjamin Button and Other Jazz Age Tales",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1219-1\/{146935CF-D64E-46E9-B63D-CFD6182F1BE4}Img100.jpg",
    "id":"books_98"
  },
  {
    "Unnamed: 0":73,
    "time":"2019-03-18T23:41:00",
    "book_name":"The House of the Spirits",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{BC0222FF-419B-4FB4-B844-CBC6128B6E7D}Img100.jpg",
    "id":"books_99"
  },
  {
    "Unnamed: 0":74,
    "time":"2019-03-15T18:59:00",
    "book_name":"The Ministry for the Future",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/0887-1\/{81F5A348-EEB3-4546-BCEA-1FCFCB709C52}Img100.jpg",
    "id":"books_100"
  },
  {
    "Unnamed: 0":75,
    "time":"2019-03-08T17:18:00",
    "book_name":"The Body",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1191-1\/{38DDDC42-8411-4192-96F7-56BE613A6F27}Img100.jpg",
    "id":"books_101"
  },
  {
    "Unnamed: 0":76,
    "time":"2019-03-07T21:42:00",
    "book_name":"Benjamin Franklin",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{0B379166-D254-4592-B4D5-FFC2254119FB}Img100.jpg",
    "id":"books_102"
  },
  {
    "Unnamed: 0":77,
    "time":"2019-04-27T23:42:00",
    "book_name":"The Curious Case of Benjamin Button and Other Jazz Age Tales",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/1219-1\/{146935CF-D64E-46E9-B63D-CFD6182F1BE4}Img100.jpg",
    "id":"books_103"
  },
  {
    "Unnamed: 0":78,
    "time":"2019-04-27T23:41:00",
    "book_name":"The House of the Spirits",
    "img_url":"https:\/\/img1.od-cdn.com\/ImageType-100\/5054-1\/{BC0222FF-419B-4FB4-B844-CBC6128B6E7D}Img100.jpg",
    "id":"books_104"
  },
  {
    "Unnamed: 0":79,
    "time":"2019-04-24T22:57:00",
    "book_name":"The Ministry for the Future",
    "img_url":"https:\/\/img1.od-cdn.com\/Im

Download .txt

gitextract_2gnihnj2/

├── .gitignore
├── .gitmodules
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DATASET.md
├── LICENSE
├── NEW_DATASOURCE.md
├── README.md
├── conf/
│   └── ingest.conf
├── docker-compose.yml
├── notebooks/
│   ├── extract_narration_tutorial.ipynb
│   ├── object_detection_tutorial.ipynb
│   └── ocr_tutorial.ipynb
├── sample_data/
│   ├── books.csv
│   ├── books.sampled.csv
│   ├── books.sampled.json
│   ├── config.ini
│   ├── create_db.sql
│   ├── episodes.csv
│   ├── episodes.json
│   ├── exercise.sampled.csv
│   ├── exercise.sampled.json
│   ├── photos.sampled.csv
│   ├── photos.sampled.json
│   ├── places.sampled.csv
│   ├── places.sampled.json
│   ├── purchase.sampled.csv
│   ├── purchase.sampled.json
│   ├── streaming.sampled.csv
│   ├── streaming.sampled.json
│   ├── trips.sampled.csv
│   ├── trips.sampled.json
│   ├── views_idx.csv
│   └── views_metadata.txt
└── src/
    ├── __init__.py
    ├── common/
    │   ├── __init__.py
    │   ├── bootstrap/
    │   │   └── data_source.json
    │   ├── generate_persona.py
    │   ├── geo_helper.py
    │   ├── objects/
    │   │   ├── EntryTypes.py
    │   │   ├── LLEntry_obj.py
    │   │   ├── __init__.py
    │   │   ├── derive_attributes.py
    │   │   └── import_configs.py
    │   ├── persistence/
    │   │   ├── __init__.py
    │   │   ├── key_value_db.py
    │   │   └── personal_data_db.py
    │   ├── user_info.json
    │   └── util.py
    ├── frontend/
    │   ├── Dockerfile
    │   ├── README.md
    │   ├── __init__.py
    │   ├── package.json
    │   ├── public/
    │   │   ├── index.html
    │   │   ├── manifest.json
    │   │   └── robots.txt
    │   ├── requirements.txt
    │   └── src/
    │       ├── App.css
    │       ├── App.js
    │       ├── App.test.js
    │       ├── Constants.js
    │       ├── index.css
    │       ├── index.js
    │       ├── map/
    │       │   ├── GoogleMapComponent.js
    │       │   └── PlaceInfo.js
    │       ├── reportWebVitals.js
    │       ├── service/
    │       │   └── DigitalDataImportor.js
    │       ├── setupTests.js
    │       └── timeline/
    │           ├── EpiTimeline.js
    │           ├── builders.js
    │           ├── constants.js
    │           └── utils.js
    ├── ingest/
    │   ├── Dockerfile
    │   ├── __init__.py
    │   ├── create_episodes.py
    │   ├── derive_episodes.py
    │   ├── enrichment/
    │   │   ├── __init__.py
    │   │   ├── find_jpegs.py
    │   │   ├── geo_enrichment.py
    │   │   ├── image_deduplication.py
    │   │   ├── image_enrichment.py
    │   │   └── socratic/
    │   │       ├── README.md
    │   │       ├── __init__.py
    │   │       ├── process.ipynb
    │   │       ├── prompts/
    │   │       │   ├── __init__.py
    │   │       │   ├── categories_places365.txt
    │   │       │   ├── extract_text_features.py
    │   │       │   ├── openimage-classnames.csv
    │   │       │   ├── place365-classnames.txt
    │   │       │   ├── tencent-ml-classnames.txt
    │   │       │   └── tencent-ml-images.txt
    │   │       ├── requirements.txt
    │   │       └── socratic.py
    │   ├── export/
    │   │   ├── __init__.py
    │   │   └── export_entities.py
    │   ├── importers/
    │   │   ├── __init__.py
    │   │   ├── create_amazon_LLEntries.py
    │   │   ├── create_apple_health_LLEntries.py
    │   │   ├── create_facebook_LLEntries.py
    │   │   ├── create_google_photo_LLEntries.py
    │   │   ├── create_googlemaps_LLEntries.py
    │   │   ├── generic_importer.py
    │   │   ├── generic_importer_workflow.py
    │   │   └── photo_importer_base.py
    │   ├── ingestion_startup.sh
    │   ├── offline_processing.py
    │   └── workflow.py
    ├── init.py
    ├── init.sh
    ├── qa/
    │   ├── Dockerfile
    │   ├── README.md
    │   ├── chatgpt_engine.py
    │   ├── posttext/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── config.ini
    │   │   ├── data/
    │   │   │   └── TimelineQA/
    │   │   │       ├── dense-100/
    │   │   │       │   ├── annual_medical_care-log.csv
    │   │   │       │   ├── config.ini
    │   │   │       │   ├── create_db.sql
    │   │   │       │   ├── daily_chat-log.csv
    │   │   │       │   ├── daily_exercise-log.csv
    │   │   │       │   ├── daily_meal-log.csv
    │   │   │       │   ├── daily_read-log.csv
    │   │   │       │   ├── daily_watchtv-log.csv
    │   │   │       │   ├── marriages-log.csv
    │   │   │       │   ├── monthly_pet_care-log.csv
    │   │   │       │   ├── moves-log.csv
    │   │   │       │   ├── persona.json
    │   │   │       │   ├── timeline-dense.csv
    │   │   │       │   ├── timeline.json
    │   │   │       │   ├── travel-log.csv
    │   │   │       │   ├── travel_dining-log.csv
    │   │   │       │   ├── travel_places_visited-log.csv
    │   │   │       │   ├── views_idx.csv
    │   │   │       │   ├── views_metadata.txt
    │   │   │       │   ├── weekly_bakeorcook-log.csv
    │   │   │       │   ├── weekly_dating-log.csv
    │   │   │       │   ├── weekly_grocery-log.csv
    │   │   │       │   └── weekly_hobby-log.csv
    │   │   │       ├── medium-100/
    │   │   │       │   ├── annual_medical_care-log.csv
    │   │   │       │   ├── config.ini
    │   │   │       │   ├── create_db.sql
    │   │   │       │   ├── daily_chat-log.csv
    │   │   │       │   ├── daily_exercise-log.csv
    │   │   │       │   ├── daily_meal-log.csv
    │   │   │       │   ├── daily_read-log.csv
    │   │   │       │   ├── daily_watchtv-log.csv
    │   │   │       │   ├── marriages-log.csv
    │   │   │       │   ├── monthly_pet_care-log.csv
    │   │   │       │   ├── moves-log.csv
    │   │   │       │   ├── persona.json
    │   │   │       │   ├── timeline-medium.csv
    │   │   │       │   ├── timeline.json
    │   │   │       │   ├── travel-log.csv
    │   │   │       │   ├── travel_dining-log.csv
    │   │   │       │   ├── travel_places_visited-log.csv
    │   │   │       │   ├── views_idx.csv
    │   │   │       │   ├── views_metadata.txt
    │   │   │       │   ├── weekly_bakeorcook-log.csv
    │   │   │       │   ├── weekly_dating-log.csv
    │   │   │       │   ├── weekly_grocery-log.csv
    │   │   │       │   └── weekly_hobby-log.csv
    │   │   │       └── sparse-100/
    │   │   │           ├── annual_medical_care-log.csv
    │   │   │           ├── config.ini
    │   │   │           ├── create_db.sql
    │   │   │           ├── daily_chat-log.csv
    │   │   │           ├── daily_exercise-log.csv
    │   │   │           ├── daily_meal-log.csv
    │   │   │           ├── daily_read-log.csv
    │   │   │           ├── daily_watchtv-log.csv
    │   │   │           ├── marriages-log.csv
    │   │   │           ├── monthly_pet_care-log.csv
    │   │   │           ├── moves-log.csv
    │   │   │           ├── persona.json
    │   │   │           ├── results/
    │   │   │           │   ├── q1-result.csv
    │   │   │           │   ├── q10-result.csv
    │   │   │           │   ├── q11-result.csv
    │   │   │           │   ├── q12-result.csv
    │   │   │           │   ├── q13-result.csv
    │   │   │           │   ├── q14-result.csv
    │   │   │           │   ├── q15-result.csv
    │   │   │           │   ├── q16-result.csv
    │   │   │           │   ├── q17-result.csv
    │   │   │           │   ├── q18-result.csv
    │   │   │           │   ├── q19-result.csv
    │   │   │           │   ├── q2-result.csv
    │   │   │           │   ├── q20-result.csv
    │   │   │           │   ├── q21-result.csv
    │   │   │           │   ├── q22-result.csv
    │   │   │           │   ├── q23-result.csv
    │   │   │           │   ├── q24-result.csv
    │   │   │           │   ├── q25-result.csv
    │   │   │           │   ├── q26-result.csv
    │   │   │           │   ├── q27-result.csv
    │   │   │           │   ├── q28-result.csv
    │   │   │           │   ├── q29-result.csv
    │   │   │           │   ├── q3-result.csv
    │   │   │           │   ├── q30-result.csv
    │   │   │           │   ├── q31-result.csv
    │   │   │           │   ├── q32-result.csv
    │   │   │           │   ├── q33-result.csv
    │   │   │           │   ├── q34-result.csv
    │   │   │           │   ├── q35-result.csv
    │   │   │           │   ├── q36-result.csv
    │   │   │           │   ├── q37-result.csv
    │   │   │           │   ├── q38-result.csv
    │   │   │           │   ├── q39-result.csv
    │   │   │           │   ├── q4-result.csv
    │   │   │           │   ├── q40-result.csv
    │   │   │           │   ├── q41-result.csv
    │   │   │           │   ├── q42-result.csv
    │   │   │           │   ├── q5-result.csv
    │   │   │           │   ├── q6-result.csv
    │   │   │           │   ├── q7-result.csv
    │   │   │           │   ├── q8-result.csv
    │   │   │           │   ├── q9-result.csv
    │   │   │           │   └── queries.csv
    │   │   │           ├── timeline.csv.txt
    │   │   │           ├── timeline.json
    │   │   │           ├── travel-log.csv
    │   │   │           ├── travel_dining-log.csv
    │   │   │           ├── travel_places_visited-log.csv
    │   │   │           ├── views_idx.csv
    │   │   │           ├── views_metadata.txt
    │   │   │           ├── weekly_bakeorcook-log.csv
    │   │   │           ├── weekly_dating-log.csv
    │   │   │           ├── weekly_grocery-log.csv
    │   │   │           └── weekly_hobby-log.csv
    │   │   ├── newqueryfile.csv
    │   │   ├── requirements.txt
    │   │   ├── server.py
    │   │   ├── src/
    │   │   │   ├── posttext.py
    │   │   │   ├── retrieval_qa.py
    │   │   │   ├── views_qa.py
    │   │   │   └── views_util.py
    │   │   └── util/
    │   │       ├── create_metadata_idx.py
    │   │       ├── data2vectorstore.py
    │   │       ├── digital_data2vectorstore.py
    │   │       ├── jsontimeline2csv.py
    │   │       ├── setup.py
    │   │       └── table2text.py
    │   ├── qa_engine.py
    │   ├── server.py
    │   └── view_engine.py
    └── requirements.txt

Download .txt

SYMBOL INDEX (358 symbols across 50 files)

FILE: sample_data/create_db.sql
  type books (line 1) | create table books(book_name TEXT, img_url TEXT, id TEXT PRIMARY KEY, da...
  type purchase (line 2) | create table purchase(purchase_id TEXT,productName TEXT,productPrice REA...
  type exercise (line 3) | create table exercise(textDescription TEXT,duration REAL,distance REAL,c...
  type photos (line 4) | create table photos(textDescription TEXT,adddress TEXT,lat REAL,long REA...
  type streaming (line 5) | create table streaming(artist TEXT,track TEXT,playtimeMs INTEGER,spotify...
  type places (line 6) | create table places(textDescription TEXT,start_address TEXT,start_lat RE...
  type trips (line 7) | create table trips(textDescription TEXT,country TEXT,states_provinces TE...

FILE: src/common/generate_persona.py
  function generate_college (line 27) | def generate_college(p):
  function generate_graduate_school (line 31) | def generate_graduate_school(p):
  function generate_college_major (line 35) | def generate_college_major(p):
  function generate_graduate_school_major (line 39) | def generate_graduate_school_major(p):
  function flip (line 43) | def flip(values, probs):
  function generate_persona (line 46) | def generate_persona():

FILE: src/common/geo_helper.py
  class GeoHelper (line 39) | class GeoHelper:
    method __init__ (line 43) | def __init__(self):
    method cache_key (line 50) | def cache_key(self, address:str):
    method reverse_cache_key (line 53) | def reverse_cache_key(self, latitude:float, longitude:float):
    method calculateLocation (line 56) | def calculateLocation(self, latitude: float, longitude: float, languag...
    method geocode (line 72) | def geocode(self, address:str, language:str="en") -> str:

FILE: src/common/objects/EntryTypes.py
  class EntryType (line 20) | class EntryType(str, Enum):
    method toJson (line 30) | def toJson(self):

FILE: src/common/objects/LLEntry_obj.py
  class LLEntry (line 22) | class LLEntry:
    method __init__ (line 23) | def __init__(self, type: str, startTime, source: str):
    method printObj (line 88) | def printObj(self):
    method __str__ (line 91) | def __str__(self):
    method toJson (line 95) | def toJson(self):
    method toDict (line 100) | def toDict(self):
  class LLEntrySummary (line 105) | class LLEntrySummary(LLEntry):
    method __init__ (line 106) | def __init__(self,
    method create_tags_from_objects (line 147) | def create_tags_from_objects(self, objects):
    method __init__ (line 158) | def __init__(self,
    method create_tags_from_objects (line 197) | def create_tags_from_objects(self, objects):
  class LLEntrySummary (line 157) | class LLEntrySummary(LLEntry):
    method __init__ (line 106) | def __init__(self,
    method create_tags_from_objects (line 147) | def create_tags_from_objects(self, objects):
    method __init__ (line 158) | def __init__(self,
    method create_tags_from_objects (line 197) | def create_tags_from_objects(self, objects):
  class LLEntryInvertedIndex (line 208) | class LLEntryInvertedIndex:
    method __init__ (line 209) | def __init__(self):
    method addEntry (line 212) | def addEntry(self, key: str, entry: LLEntry):
    method getEntries (line 217) | def getEntries(self, key):
    method __str__ (line 223) | def __str__(self):

FILE: src/common/objects/derive_attributes.py
  class DependencyNode (line 26) | class DependencyNode:
    method __init__ (line 27) | def __init__(self, attribute: str, depends_on: list, functionName: str):
    method __str__ (line 32) | def __str__(self):
  class AttributeFiller (line 36) | class AttributeFiller:
    method __init__ (line 37) | def __init__(self):
    method fillMissingAttributes (line 46) | def fillMissingAttributes(self, lifeLog:LLEntry):
    method fillRecursive (line 53) | def fillRecursive(self, lifelog:LLEntry, dependency:DependencyNode, se...
    method isAbsent (line 88) | def isAbsent(self, val):

FILE: src/common/objects/import_configs.py
  function get_val_or_none (line 63) | def get_val_or_none(kv:dict, key:str):
  class FileType (line 69) | class FileType(str, Enum):
    method toJson (line 74) | def toJson(self):
  class SourceConfigs (line 77) | class SourceConfigs:
    method __init__ (line 78) | def __init__(self, input_directory:str, filetype:FileType,
    method __str__ (line 87) | def __str__(self):
    method toJson (line 90) | def toJson(self):
  class FieldMapping (line 94) | class FieldMapping:
    method __init__ (line 95) | def __init__(self,src, target, src_type, target_type, functions, defau...
    method __str__ (line 103) | def __str__(self):
    method toJson (line 106) | def toJson(self):
  class DataSource (line 110) | class DataSource:
    method __init__ (line 111) | def __init__(self, id, source_name, entry_type:EntryType, configs, fie...
    method __str__ (line 132) | def __str__(self):
    method toJson (line 136) | def toJson(self):
  class DataSourceList (line 139) | class DataSourceList:
    method __init__ (line 140) | def __init__(self, ds):
    method __str__ (line 152) | def __str__(self):
    method toJson (line 155) | def toJson(self):

FILE: src/common/persistence/key_value_db.py
  class CacheDBConnector (line 25) | class CacheDBConnector:
    method __new__ (line 26) | def __new__(cls):
    method setup_tables (line 55) | def setup_tables(self):
    method get (line 71) | def get(self, key:str, table:str) -> str:
    method put (line 76) | def put(self,key:str, value:str, table:str):
    method execute_write (line 80) | def execute_write(self,sql,params=None):

FILE: src/common/persistence/personal_data_db.py
  class PersonalDataDBConnector (line 55) | class PersonalDataDBConnector:
    method __new__ (line 56) | def __new__(cls):
    method execute_write (line 94) | def execute_write(self,sql,params=None):
    method get_data_source_location (line 103) | def get_data_source_location(self):
    method setup_tables (line 106) | def setup_tables(self):
    method add_or_replace_personal_data (line 137) | def add_or_replace_personal_data(self, key_value: dict, unique_key=None):
    method add_or_replace (line 139) | def add_or_replace(self, table, key_value:dict, unique_key:str=None):
    method read_data_source_conf (line 168) | def read_data_source_conf(self, select_cols:str, source_name=None):
    method search_personal_data (line 176) | def search_personal_data(self, select_cols: str, where_conditions: dic...
    method add_photo (line 189) | def add_photo(self, source_id: str, obj: LLEntry):
    method add_only_photo (line 197) | def add_only_photo(self, source_id: str, imageFileName: str, imageFile...
    method is_same_photo_present (line 204) | def is_same_photo_present(self, source, filename, timestamp):
    method print_data_stats_by_source (line 216) | def print_data_stats_by_source(self):

FILE: src/common/util.py
  function xOfy (line 35) | def xOfy(i, total):
  function extractYMD (line 40) | def extractYMD(date):
  function extract_month (line 47) | def extract_month(date):
  function extract_DOM (line 50) | def extract_DOM(date):
  function sameMonth (line 53) | def sameMonth(date1, date2):
  function truncateStringNum (line 59) | def truncateStringNum(num,digits):
  function dayToDate (line 68) | def dayToDate(num):
  function daysSinceEpoch (line 73) | def daysSinceEpoch(date):
  function nDaysAfterEpoch (line 81) | def nDaysAfterEpoch(n):
  function extractTOD (line 87) | def extractTOD(s):
  function convertToTimezone (line 92) | def convertToTimezone (t, tz_origin, tz_target):
  function convertlatlongToTimezone (line 105) | def convertlatlongToTimezone(lat_string, long_string):
  function convertOutOfE7 (line 112) | def convertOutOfE7(s):
  function extractYMDHM (line 115) | def extractYMDHM(date):
  function dict_to_json (line 125) | def dict_to_json(dict):
  function distance (line 140) | def distance(latlon_a, latlon_b):
  function translate_place_name (line 161) | def translate_place_name(place_name: str) -> str:
  function is_home (line 179) | def is_home(loc: Location):
  function get_location_attr (line 188) | def get_location_attr(loc: Location, attr: Union[str, List]):
  function str_to_location (line 217) | def str_to_location(loc: str):
  function get_coordinate (line 230) | def get_coordinate(loc: Union[str, Location]):

FILE: src/frontend/src/App.js
  function App (line 50) | function App() {

FILE: src/frontend/src/map/GoogleMapComponent.js
  function GoogleMapComponent (line 28) | function GoogleMapComponent(props) {

FILE: src/frontend/src/map/PlaceInfo.js
  function PlaceInfo (line 23) | function PlaceInfo(props) {

FILE: src/frontend/src/timeline/EpiTimeline.js
  constant MIN_ZOOM (line 26) | const MIN_ZOOM = 2;
  constant MAX_ZOOM (line 27) | const MAX_ZOOM = 512;
  function EpiTimeline (line 29) | function EpiTimeline(props) {

FILE: src/frontend/src/timeline/constants.js
  constant START_YEAR (line 18) | const START_YEAR = 2021;
  constant NUM_OF_YEARS (line 19) | const NUM_OF_YEARS = 3;
  constant MONTH_NAMES (line 20) | const MONTH_NAMES = [
  constant MONTHS_PER_YEAR (line 34) | const MONTHS_PER_YEAR = 12;
  constant QUARTERS_PER_YEAR (line 35) | const QUARTERS_PER_YEAR = 4;
  constant MONTHS_PER_QUARTER (line 36) | const MONTHS_PER_QUARTER = 3;
  constant NUM_OF_MONTHS (line 37) | const NUM_OF_MONTHS = NUM_OF_YEARS * MONTHS_PER_YEAR;
  constant MAX_TRACK_START_GAP (line 38) | const MAX_TRACK_START_GAP = 4;
  constant MAX_ELEMENT_GAP (line 39) | const MAX_ELEMENT_GAP = 8;
  constant MAX_MONTH_SPAN (line 40) | const MAX_MONTH_SPAN = 8;
  constant MIN_MONTH_SPAN (line 41) | const MIN_MONTH_SPAN = 2;
  constant NUM_OF_TRACKS (line 42) | const NUM_OF_TRACKS = 3;
  constant MAX_NUM_OF_SUBTRACKS (line 43) | const MAX_NUM_OF_SUBTRACKS = 2;

FILE: src/frontend/src/timeline/utils.js
  constant COLORS (line 45) | const COLORS = COLORS_pallettes[0];
  constant ADJECTIVES (line 85) | const ADJECTIVES = [
  constant NOUNS (line 187) | const NOUNS = [

FILE: src/ingest/create_episodes.py
  class EpisodeCreator (line 27) | class EpisodeCreator:
    method __init__ (line 29) | def __init__(self, app_path='personal-data/app_data/'):
    method create_all_episodes (line 36) | def create_all_episodes(self):
    method create_books_index (line 73) | def create_books_index(self):
    method create_books_table (line 92) | def create_books_table(self, table: List):
    method create_places_table (line 133) | def create_places_table(self, table: List):
    method create_photos_table (line 203) | def create_photos_table(self, table: List):
    method create_streaming_table (line 236) | def create_streaming_table(self, table: List):
    method create_exercise_table (line 293) | def create_exercise_table(self, table: List):
    method create_purchase_table (line 332) | def create_purchase_table(self, table: List):

FILE: src/ingest/derive_episodes.py
  class EpisodeDeriver (line 32) | class EpisodeDeriver:
    method __init__ (line 33) | def __init__(self, app_path='personal-data/app_data/'):
    method run (line 40) | def run(self):
    method distance (line 56) | def distance(self, latlon_a, latlon_b):
    method get_center (line 75) | def get_center(self, lat_lons):
    method add_tag (line 81) | def add_tag(self, segments: List[dict], d=200):
    method cluster (line 117) | def cluster(self, places: pd.DataFrame, d=200):
    method derive_trips (line 153) | def derive_trips(self, places: pd.DataFrame):
    method summarize (line 166) | def summarize(self, input):
    method summarize_oneline (line 171) | def summarize_oneline(self, input):
    method describe_place (line 176) | def describe_place(self, address):
    method summarize_day (line 181) | def summarize_day(self, places, details="low"):
    method make_trip_table (line 216) | def make_trip_table(self, trips, places):

FILE: src/ingest/enrichment/find_jpegs.py
  function json_file (line 21) | def json_file (file_name):
  function jpeg_file (line 24) | def jpeg_file (file_name):
  function heic_file (line 27) | def heic_file (file_name):
  function jpg_file (line 30) | def jpg_file (file_name):

FILE: src/ingest/enrichment/geo_enrichment.py
  class LocationEnricher (line 29) | class LocationEnricher:
    method __init__ (line 30) | def __init__(self):
    method enrich (line 34) | def enrich(self, incremental:bool=True):

FILE: src/ingest/enrichment/image_enrichment.py
  class ImageEnricher (line 31) | class ImageEnricher:
    method __init__ (line 32) | def __init__(self) -> None:
    method enhance (line 35) | def enhance(img_path: str, k=5):
    method deduplicate (line 93) | def deduplicate(self, img_paths: List[str]):
    method enrich (line 118) | def enrich(self, incremental:bool=True):

FILE: src/ingest/enrichment/socratic/socratic.py
  function load_openimage_classnames (line 41) | def load_openimage_classnames(csv_path):
  function load_tencentml_classnames (line 48) | def load_tencentml_classnames(txt_path):
  function build_simple_classifier (line 55) | def build_simple_classifier(clip_model, text_list, template, device):
  function load_models (line 65) | def load_models():
  function drop_gpu (line 113) | def drop_gpu(tensor):
  function zeroshot_classifier (line 120) | def zeroshot_classifier(image):
  function generate_prompt (line 155) | def generate_prompt(openimage_classes, tencentml_classes, place365_class...
  function sorting_texts (line 181) | def sorting_texts(image_features, captions):
  function postprocess_results (line 194) | def postprocess_results(scores, classes):

FILE: src/ingest/export/export_entities.py
  class PhotoExporter (line 26) | class PhotoExporter:
    method __init__ (line 27) | def __init__(self):
    method get_all_data (line 31) | def get_all_data(self):
    method generate_export_list (line 35) | def generate_export_list(self):
    method create_export_entity (line 50) | def create_export_entity(self, incremental:bool=True):
    method populate_location (line 92) | def populate_location(self, data:LLEntry, locations:Location) -> LLEntry:
    method populate_captions (line 104) | def populate_captions(self, data:LLEntry, captions: list) -> LLEntry:
    method populate_text_description (line 111) | def populate_text_description(self, data:LLEntry) -> LLEntry:

FILE: src/ingest/importers/create_amazon_LLEntries.py
  function purchaseExtract (line 40) | def purchaseExtract(product, price, date, quantity):
  function digitalPurchaseExtract (line 55) | def digitalPurchaseExtract(title, price, currency, date):
  function slash_to_dash (line 66) | def slash_to_dash(date):
  function MDYY_to_dash (line 72) | def MDYY_to_dash(date):

FILE: src/ingest/importers/create_apple_health_LLEntries.py
  class AppleHealthImporter (line 34) | class AppleHealthImporter(GenericImporter):
    method __init__ (line 35) | def __init__(self, source_id:int, source_name: str, entry_type: EntryT...
    method import_data (line 46) | def import_data(self, field_mappings):
    method remove_lines_between (line 75) | def remove_lines_between(self, xml_file_path):
    method parse_large_xml_file (line 94) | def parse_large_xml_file(self, xml_file_path):
    method create_LLEntry (line 109) | def create_LLEntry(self, child, tag):

FILE: src/ingest/importers/create_facebook_LLEntries.py
  class FacebookPhotosImporter (line 25) | class FacebookPhotosImporter(PhotoImporter):
    method __init__ (line 26) | def __init__(self, source_id:int,  source_name: str, entry_type: Entry...
    method import_photos (line 29) | def import_photos(self, cwd, subdir):

FILE: src/ingest/importers/create_google_photo_LLEntries.py
  class GooglePhotosImporter (line 27) | class GooglePhotosImporter(PhotoImporter):
    method __init__ (line 28) | def __init__(self, source_id:int,  source_name: str, entry_type: Entry...
    method import_photos (line 31) | def import_photos(self, cwd, subdir):

FILE: src/ingest/importers/create_googlemaps_LLEntries.py
  class GoogleMapsImporter (line 37) | class GoogleMapsImporter(GenericImporter):
    method __init__ (line 38) | def __init__(self, source_id:int,  source_name: str, entry_type: Entry...
    method calculateLocationFromLatLong (line 40) | def calculateLocationFromLatLong(self, lat, lon):
    method generate_textDescription (line 46) | def generate_textDescription(self, details):
    method placeVisitExtract (line 62) | def placeVisitExtract(self, visit):
    method activitySegmentExtract (line 103) | def activitySegmentExtract(self, activity):
    method import_data (line 178) | def import_data(self, field_mappings:list):

FILE: src/ingest/importers/generic_importer.py
  class GenericImporter (line 38) | class GenericImporter:
    method __init__ (line 39) | def __init__(self, source_id:int, source_name:str, entry_type:EntryTyp...
    method import_data (line 48) | def import_data(self, field_mappings:list):
    method generate_textDescription (line 51) | def generate_textDescription(self, details):
    method get_type_files_deep (line 67) | def get_type_files_deep(self, pathname: str, filename_pattern: str, ty...
    method build_db_entry (line 91) | def build_db_entry(self, obj: LLEntry):
    method build_dedup_value (line 100) | def build_dedup_value(self, obj: LLEntry, unique_key:list):
    method create_LLEntry (line 108) | def create_LLEntry(self, row:dict, field_mappings:list):
    method evaluate_functions (line 152) | def evaluate_functions(self, fieldMapping:FieldMapping, lifelog_obj:LL...
  class SimpleJSONImporter (line 180) | class SimpleJSONImporter(GenericImporter):
    method __init__ (line 181) | def __init__(self, source_id:int, source_name:str, entry_type:EntryTyp...
    method import_data (line 185) | def import_data(self, field_mappings:list):
  class CSVImporter (line 208) | class CSVImporter(GenericImporter):
    method __init__ (line 209) | def __init__(self, source_id:int, source_name: str, entry_type: EntryT...
    method import_data (line 213) | def import_data(self, field_mappings: list):

FILE: src/ingest/importers/generic_importer_workflow.py
  class GenericImportOrchestrator (line 29) | class GenericImportOrchestrator:
    method __init__ (line 30) | def __init__(self):
    method add_new_source (line 34) | def add_new_source(self, datasource: DataSourceList):
    method start_import (line 38) | def start_import(self):
    method import_from_xml (line 67) | def import_from_xml(self, source_name: str, configs: SourceConfigs, fi...

FILE: src/ingest/importers/photo_importer_base.py
  class PhotoImporter (line 38) | class PhotoImporter(GenericImporter):
    method __init__ (line 40) | def __init__(self, source_id:int, source_name:str, entry_type:EntryTyp...
    method import_photos (line 49) | def import_photos(self, cwd, subdir):
    method import_data (line 52) | def import_data(self, field_mappings:list):
    method calculateExperiencedTimeRealAndUtc (line 57) | def calculateExperiencedTimeRealAndUtc(self, latitude: float, longitud...
    method get_type_files_deep (line 70) | def get_type_files_deep(self, pathname: str, filename_pattern: str, ty...
    method find_all_in_haystack (line 96) | def find_all_in_haystack(self, needle, haystack, return_parent: bool):
    method flatten (line 124) | def flatten(self, struct):
    method get_filename_from_path (line 134) | def get_filename_from_path(self, uri):
    method is_photo_already_processed (line 139) | def is_photo_already_processed(self, filename, taken_timestamp):
    method create_LLEntry (line 142) | def create_LLEntry(self,

FILE: src/ingest/offline_processing.py
  class LLImage (line 36) | class LLImage:
    method __init__ (line 37) | def __init__(self,
  function create_image_summary (line 52) | def create_image_summary(images: List[LLImage], k=3):
  function postprocess_bloom (line 73) | def postprocess_bloom(answer: str, keywords: List[str]=None):
  function summarize_activity (line 93) | def summarize_activity(entries: List[LLImage]):
  function summarize_day (line 133) | def summarize_day(day: List[List[LLImage]], activity_index: Dict):
  function trip_data_to_text (line 187) | def trip_data_to_text(locations: List[Location], start_date: List, num_d...
  function organize_images_by_tags (line 208) | def organize_images_by_tags(images: List[LLImage]):
  function get_location (line 224) | def get_location(segment) -> Location:
  function get_start_end_location (line 268) | def get_start_end_location(segment):
  function get_timestamp (line 274) | def get_timestamp(obj):
  function create_segments (line 285) | def create_segments(entries: List[LLImage]):
  function convert_LLEntry_LLImage (line 356) | def convert_LLEntry_LLImage(entries: List[LLEntry]):
  function create_trip_summary (line 379) | def create_trip_summary(entries: List[LLEntry]):

FILE: src/qa/chatgpt_engine.py
  class ChatGPTEngine (line 28) | class ChatGPTEngine:
    method __init__ (line 29) | def __init__(self):
    method query (line 42) | def query(self, message):

FILE: src/qa/posttext/data/TimelineQA/dense-100/create_db.sql
  type annual_medical_care_log (line 2) | create table annual_medical_care_log(eid TEXT PRIMARY KEY,date TEXT,for_...
  type daily_chat_log (line 3) | create table daily_chat_log(eid TEXT PRIMARY KEY,date TEXT,timeofday TEX...
  type daily_exercise_log (line 4) | create table daily_exercise_log(eid TEXT PRIMARY KEY,date TEXT,exercise ...
  type daily_meal_log (line 5) | create table daily_meal_log(eid TEXT PRIMARY KEY,date TEXT,mealtype TEXT...
  type daily_read_log (line 6) | create table daily_read_log(eid TEXT PRIMARY KEY,date TEXT,readtype TEXT...
  type daily_watchtv_log (line 7) | create table daily_watchtv_log(eid TEXT PRIMARY KEY,date TEXT,watchtype ...
  type marriages_log (line 8) | create table marriages_log(eid TEXT PRIMARY KEY,married_date TEXT,partne...
  type monthly_pet_care_log (line 9) | create table monthly_pet_care_log(eid TEXT PRIMARY KEY,date TEXT,pet_car...
  type moves_log (line 10) | create table moves_log(eid TEXT PRIMARY KEY,date TEXT,type_of_move TEXT,...
  type travel_log (line 11) | create table travel_log(eid TEXT PRIMARY KEY,start_date TEXT,end_date TE...
  type travel_dining_log (line 12) | create table travel_dining_log(eid TEXT PRIMARY KEY,start_date TEXT,end_...
  type travel_places_visited_log (line 13) | create table travel_places_visited_log(eid TEXT PRIMARY KEY,start_date T...
  type weekly_bakeorcook_log (line 14) | create table weekly_bakeorcook_log(eid TEXT PRIMARY KEY,date TEXT,cuisin...
  type weekly_dating_log (line 15) | create table weekly_dating_log(eid TEXT PRIMARY KEY,date TEXT,people_str...
  type weekly_grocery_log (line 16) | create table weekly_grocery_log(eid TEXT PRIMARY KEY,date TEXT,fruits TE...
  type weekly_hobby_log (line 17) | create table weekly_hobby_log(eid TEXT PRIMARY KEY,date TEXT,hobbies TEX...

FILE: src/qa/posttext/data/TimelineQA/medium-100/create_db.sql
  type annual_medical_care_log (line 2) | create table annual_medical_care_log(eid TEXT PRIMARY KEY,date TEXT,for_...
  type daily_chat_log (line 3) | create table daily_chat_log(eid TEXT PRIMARY KEY,date TEXT,timeofday TEX...
  type daily_exercise_log (line 4) | create table daily_exercise_log(eid TEXT PRIMARY KEY,date TEXT,exercise ...
  type daily_meal_log (line 5) | create table daily_meal_log(eid TEXT PRIMARY KEY,date TEXT,mealtype TEXT...
  type daily_read_log (line 6) | create table daily_read_log(eid TEXT PRIMARY KEY,date TEXT,readtype TEXT...
  type daily_watchtv_log (line 7) | create table daily_watchtv_log(eid TEXT PRIMARY KEY,date TEXT,watchtype ...
  type marriages_log (line 8) | create table marriages_log(eid TEXT PRIMARY KEY,married_date TEXT,partne...
  type monthly_pet_care_log (line 9) | create table monthly_pet_care_log(eid TEXT PRIMARY KEY,date TEXT,pet_car...
  type moves_log (line 10) | create table moves_log(eid TEXT PRIMARY KEY,date TEXT,type_of_move TEXT,...
  type travel_log (line 11) | create table travel_log(eid TEXT PRIMARY KEY,start_date TEXT,end_date TE...
  type travel_dining_log (line 12) | create table travel_dining_log(eid TEXT PRIMARY KEY,start_date TEXT,end_...
  type travel_places_visited_log (line 13) | create table travel_places_visited_log(eid TEXT PRIMARY KEY,start_date T...
  type weekly_bakeorcook_log (line 14) | create table weekly_bakeorcook_log(eid TEXT PRIMARY KEY,date TEXT,cuisin...
  type weekly_dating_log (line 15) | create table weekly_dating_log(eid TEXT PRIMARY KEY,date TEXT,people_str...
  type weekly_grocery_log (line 16) | create table weekly_grocery_log(eid TEXT PRIMARY KEY,date TEXT,fruits TE...
  type weekly_hobby_log (line 17) | create table weekly_hobby_log(eid TEXT PRIMARY KEY,date TEXT,hobbies TEX...

FILE: src/qa/posttext/data/TimelineQA/sparse-100/create_db.sql
  type annual_medical_care_log (line 1) | create table annual_medical_care_log(eid TEXT PRIMARY KEY,date TEXT,for_...
  type daily_chat_log (line 2) | create table daily_chat_log(eid TEXT PRIMARY KEY,date TEXT,timeofday TEX...
  type daily_exercise_log (line 3) | create table daily_exercise_log(eid TEXT PRIMARY KEY,date TEXT,exercise ...
  type daily_meal_log (line 4) | create table daily_meal_log(eid TEXT PRIMARY KEY,date TEXT,mealtype TEXT...
  type daily_read_log (line 5) | create table daily_read_log(eid TEXT PRIMARY KEY,date TEXT,readtype TEXT...
  type daily_watchtv_log (line 6) | create table daily_watchtv_log(eid TEXT PRIMARY KEY,date TEXT,watchtype ...
  type marriages_log (line 7) | create table marriages_log(eid TEXT PRIMARY KEY,married_date TEXT,partne...
  type monthly_pet_care_log (line 8) | create table monthly_pet_care_log(eid TEXT PRIMARY KEY,date TEXT,pet_car...
  type moves_log (line 9) | create table moves_log(eid TEXT PRIMARY KEY,date TEXT,type_of_move TEXT,...
  type travel_log (line 10) | create table travel_log(eid TEXT PRIMARY KEY,start_date TEXT,end_date TE...
  type travel_dining_log (line 11) | create table travel_dining_log(eid TEXT PRIMARY KEY,start_date TEXT,end_...
  type travel_places_visited_log (line 12) | create table travel_places_visited_log(eid TEXT PRIMARY KEY,start_date T...
  type weekly_bakeorcook_log (line 13) | create table weekly_bakeorcook_log(eid TEXT PRIMARY KEY,date TEXT,cuisin...
  type weekly_dating_log (line 14) | create table weekly_dating_log(eid TEXT PRIMARY KEY,date TEXT,people_str...
  type weekly_grocery_log (line 15) | create table weekly_grocery_log(eid TEXT PRIMARY KEY,date TEXT,fruits TE...
  type weekly_hobby_log (line 16) | create table weekly_hobby_log(eid TEXT PRIMARY KEY,date TEXT,hobbies TEX...

FILE: src/qa/posttext/server.py
  function create_app (line 28) | def create_app():
  function test (line 41) | def test():
  function query (line 53) | def query():

FILE: src/qa/posttext/src/posttext.py
  class PostText (line 28) | class PostText:
    method __init__ (line 30) | def __init__(self,
    method query (line 39) | def query(self, question: str):
  function main (line 81) | def main(argv):

FILE: src/qa/posttext/src/retrieval_qa.py
  class RetrievalBasedQA (line 27) | class RetrievalBasedQA:
    method __init__ (line 29) | def __init__(self, config, directory):
    method load_source_vectorstore (line 44) | def load_source_vectorstore(self, path: str):
    method query (line 52) | def query(self, question: str):
    method __call__ (line 62) | def __call__(self, question: str):

FILE: src/qa/posttext/src/views_qa.py
  class ViewBasedQA (line 42) | class ViewBasedQA:
    method __init__ (line 44) | def __init__(self,
    method load_viewsmetadata_embeddings (line 67) | def load_viewsmetadata_embeddings(self):
    method match_views (line 88) | def match_views(self, question: str):
    method generate_provenance_query (line 102) | def generate_provenance_query(self, sqlquery, tablename, key):
    method removeGOH (line 116) | def removeGOH(self, fromclause):
    method generate_prov_query (line 140) | def generate_prov_query(self, sqlquery, key):
    method table_result2English (line 184) | def table_result2English(self,
    method generate_prompt_example_records (line 214) | def generate_prompt_example_records(self,
    method query_views (line 236) | def query_views(self,
    method query (line 373) | def query(self, question:str):
    method __call__ (line 383) | def __call__(self, question: str):

FILE: src/qa/posttext/src/views_util.py
  function get_embedding_with_cache (line 36) | def get_embedding_with_cache(text,engine):
  function _customLIKE (line 46) | def _customLIKE(arg1,arg2):
  function raise_missing_field_error (line 98) | def raise_missing_field_error(error_str):
  function read_views_catalog (line 105) | def read_views_catalog(viewspath):
  function escapeSingleQuote (line 208) | def escapeSingleQuote(str):
  function strip_percent (line 217) | def strip_percent(str):
  function prep_SQL (line 229) | def prep_SQL(sql, question, schema):
  function get_desc (line 302) | def get_desc(views_catalog_dict, short):
  function get_table_names (line 317) | def get_table_names(views_catalog_dict):

FILE: src/qa/posttext/util/create_metadata_idx.py
  function main (line 28) | def main(argv):

FILE: src/qa/posttext/util/data2vectorstore.py
  function chunk_to_string (line 35) | def chunk_to_string(ch):
  function main (line 39) | def main(argv):

FILE: src/qa/posttext/util/digital_data2vectorstore.py
  function verbalize (line 28) | def verbalize(episodes):
  function main (line 46) | def main(argv):

FILE: src/qa/posttext/util/jsontimeline2csv.py
  function main (line 22) | def main(argv):

FILE: src/qa/posttext/util/setup.py
  class Setup (line 33) | class Setup:
    method __init__ (line 34) | def __init__(self,
    method install_metadata (line 39) | def install_metadata(self):
    method install_views (line 73) | def install_views(self):
    method verbalize (line 82) | def verbalize(self, episodes):
    method install_data_embeddings (line 97) | def install_data_embeddings(self):
  function main (line 136) | def main(argv):

FILE: src/qa/posttext/util/table2text.py
  function verbalize (line 22) | def verbalize(episodes, template):
  function main (line 31) | def main(argv):

FILE: src/qa/qa_engine.py
  class QAEngine (line 32) | class QAEngine:
    method __init__ (line 34) | def __init__(self, path, k=10):
    method verbalize (line 81) | def verbalize(self, episodes: pd.DataFrame):
    method query (line 96) | def query(self,

FILE: src/qa/server.py
  function test (line 46) | def test():
  function launch (line 57) | def launch():
  function query (line 70) | def query():

FILE: src/qa/view_engine.py
  class ViewEngine (line 29) | class ViewEngine:
    method __init__ (line 31) | def __init__(self, path):
    method verbalize (line 38) | def verbalize(self, query: str, answer: str):
    method flatten (line 47) | def flatten(self, lst):
    method query (line 61) | def query(self,

Copy disabled (too large) Download .json

Condensed preview — 244 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (31,776K chars).

[
  {
    "path": ".gitignore",
    "chars": 72,
    "preview": ".idea/*\n*__pycache__*\n*.pkl\n*.pt\n*.db\n*.pyc\nstatic/*\npersonal-data\nenv/\n"
  },
  {
    "path": ".gitmodules",
    "chars": 78,
    "preview": "[submodule \"BLIP\"]\n\tpath = BLIP\n\turl = https://github.com/salesforce/BLIP.git\n"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 3541,
    "preview": "# Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, we as\ncontributors and"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1259,
    "preview": "# Contributing to personal-timeline\nWe want to make contributing to this project as easy and transparent as\npossible.\n\n#"
  },
  {
    "path": "DATASET.md",
    "chars": 6075,
    "preview": "# Personal timeline dataset\n\nThis dataset is a sample of ~2 months of one of our own member’s personal digital services "
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "NEW_DATASOURCE.md",
    "chars": 2639,
    "preview": "# Adding a New Data Source\nThere are two ways to add a new data source depending on the complexity of input.\nIf you have"
  },
  {
    "path": "README.md",
    "chars": 14156,
    "preview": "<!-- This file explains how to create LifeLog entries from several data sources. -->\n\n# TimelineBuilder\n\n## Table of Con"
  },
  {
    "path": "conf/ingest.conf",
    "chars": 378,
    "preview": "# incremental_* if True, will only process data that was not processed previously.\n# When false, it will re-process all "
  },
  {
    "path": "docker-compose.yml",
    "chars": 850,
    "preview": "version: \"3.9\"\n\nservices:\n  frontend:\n    build:\n      context: .\n      dockerfile: src/frontend/Dockerfile\n    ports:\n "
  },
  {
    "path": "notebooks/extract_narration_tutorial.ipynb",
    "chars": 71608,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"id\": \"fb140897\",\n      \"metadata\": {\n        \"id\": \"fb140897\""
  },
  {
    "path": "notebooks/ocr_tutorial.ipynb",
    "chars": 38103,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"id\": \"fb140897\",\n      \"metadata\": {\n        \"id\": \"fb140897\""
  },
  {
    "path": "sample_data/books.csv",
    "chars": 14070,
    "preview": "book_name,img_url,id,date,time\nI Am a Strange Loop,https://img1.od-cdn.com/ImageType-100/0887-1/{A6AA81F6-9242-4793-8AB0"
  },
  {
    "path": "sample_data/books.sampled.csv",
    "chars": 14615,
    "preview": ",Unnamed: 0,time,book_name,img_url,id\n0,0,2019-04-19T04:37:00,I Am a Strange Loop,https://img1.od-cdn.com/ImageType-100/"
  },
  {
    "path": "sample_data/books.sampled.json",
    "chars": 22940,
    "preview": "[\n  {\n    \"Unnamed: 0\":0,\n    \"time\":\"2019-04-19T04:37:00\",\n    \"book_name\":\"I Am a Strange Loop\",\n    \"img_url\":\"https:"
  },
  {
    "path": "sample_data/config.ini",
    "chars": 1547,
    "preview": "[embedding_model]\nmodel = text-embedding-ada-002\nencoding = cl100k_base\nmax_tokens = 8000\n\n[Views]\nmodel_name = gpt-3.5-"
  },
  {
    "path": "sample_data/create_db.sql",
    "chars": 1448,
    "preview": "create table books(book_name TEXT, img_url TEXT, id TEXT PRIMARY KEY, date TEXT, time TEXT);\ncreate table purchase(purch"
  },
  {
    "path": "sample_data/episodes.csv",
    "chars": 205732,
    "preview": "id,desc,details,date\nbooks_26,\"I purchased the book \"\"I Am a Strange Loop\"\" from Amazon Kindle\",\"I purchased the book \"\""
  },
  {
    "path": "sample_data/episodes.json",
    "chars": 276380,
    "preview": "[\n  {\n    \"id\":\"books_26\",\n    \"desc\":\"I purchased the book \\\"I Am a Strange Loop\\\" from Amazon Kindle\",\n    \"details\":\""
  },
  {
    "path": "sample_data/exercise.sampled.csv",
    "chars": 3967,
    "preview": "start_time,end_time,textDescription,duration,distance,calories,outdoor,temperature,id\n2019-03-02 08:00:34-08:00,2019-03-"
  },
  {
    "path": "sample_data/exercise.sampled.json",
    "chars": 9114,
    "preview": "[\n  {\n    \"start_time\":\"2019-03-02 08:00:34-08:00\",\n    \"end_time\":\"2019-03-02 08:39:59 -0800\",\n    \"textDescription\":\"0"
  },
  {
    "path": "sample_data/photos.sampled.csv",
    "chars": 207010,
    "preview": ",Unnamed: 0,start_time,end_time,textDescription,address,lat,long,details,img_url,id\n0,0,2019-04-18 00:01:26+00:00,2019-0"
  },
  {
    "path": "sample_data/photos.sampled.json",
    "chars": 263772,
    "preview": "[\n  {\n    \"Unnamed: 0\":0,\n    \"start_time\":\"2019-04-18 00:01:26+00:00\",\n    \"end_time\":\"2019-04-18 00:01:26+00:00\",\n    "
  },
  {
    "path": "sample_data/places.sampled.csv",
    "chars": 150642,
    "preview": "start_time,end_time,textDescription,start_address,start_lat,start_long,end_address,end_lat,end_long,id\n2019-04-18 00:01:"
  },
  {
    "path": "sample_data/places.sampled.json",
    "chars": 239861,
    "preview": "[\n  {\n    \"start_time\":\"2019-04-18 00:01:26+00:00\",\n    \"end_time\":\"2019-04-18 00:01:26+00:00\",\n    \"textDescription\":\"f"
  },
  {
    "path": "sample_data/purchase.sampled.csv",
    "chars": 11905,
    "preview": ",Unnamed: 0,time,purchase_id,productName,productPrice,productQuantity,id\n0,0,2019-03-26T16:29:16,114-9774413-4401831,\"Dr"
  },
  {
    "path": "sample_data/purchase.sampled.json",
    "chars": 24529,
    "preview": "[\n  {\n    \"Unnamed: 0\":0,\n    \"time\":\"2019-03-26T16:29:16\",\n    \"purchase_id\":\"114-9774413-4401831\",\n    \"productName\":\""
  },
  {
    "path": "sample_data/streaming.sampled.csv",
    "chars": 17471,
    "preview": ",Unnamed: 0,start_time,end_time,artist,track,playtimeMs,spotify_link,id\n0,0,2019-03-30T11:34:59.982000,2019-03-30T11:35:"
  },
  {
    "path": "sample_data/streaming.sampled.json",
    "chars": 33965,
    "preview": "[\n  {\n    \"Unnamed: 0\":0,\n    \"start_time\":\"2019-03-30T11:34:59.982000\",\n    \"end_time\":\"2019-03-30T11:35:00\",\n    \"arti"
  },
  {
    "path": "sample_data/trips.sampled.csv",
    "chars": 4757,
    "preview": ",Unnamed: 0,start_time,end_time,textDescription,country,states_provinces,cities_towns,places,id\n0,0,2019-03-27 11:42:43+"
  },
  {
    "path": "sample_data/trips.sampled.json",
    "chars": 6235,
    "preview": "[\n  {\n    \"Unnamed: 0\":0,\n    \"start_time\":\"2019-03-27 11:42:43+09:00\",\n    \"end_time\":\"2019-03-28 06:38:09+09:00\",\n    "
  },
  {
    "path": "sample_data/views_idx.csv",
    "chars": 240861,
    "preview": "tablename,embedding\nbooks,\"[-0.006182512734085321, 0.009406983852386475, 0.00993300974369049, -0.015493855811655521, -0."
  },
  {
    "path": "sample_data/views_metadata.txt",
    "chars": 7277,
    "preview": "#This file contains meta data about views. \n#name: The name of the view, \n#description: a text description of what the v"
  },
  {
    "path": "src/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/bootstrap/data_source.json",
    "chars": 7350,
    "preview": "[\n  {\n    \"id\": 1,\n    \"source_name\": \"AppleHealth\",\n    \"entry_type\": \"health\",\n    \"configs\": {\n      \"input_directory"
  },
  {
    "path": "src/common/generate_persona.py",
    "chars": 7030,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/geo_helper.py",
    "chars": 3486,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/objects/EntryTypes.py",
    "chars": 937,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/objects/LLEntry_obj.py",
    "chars": 6791,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/objects/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/objects/derive_attributes.py",
    "chars": 4025,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/objects/import_configs.py",
    "chars": 7475,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/persistence/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/persistence/key_value_db.py",
    "chars": 3330,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/persistence/personal_data_db.py",
    "chars": 10310,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/common/user_info.json",
    "chars": 535,
    "preview": "{\n    \"name\": \"Hilbert\", \n    \"address\": \"Menlo Park, California, United States\",\n    \"addresses\": [\n        {\n         "
  },
  {
    "path": "src/common/util.py",
    "chars": 7046,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/frontend/Dockerfile",
    "chars": 507,
    "preview": "FROM node:18.16.0\n\nWORKDIR /app \n\n\nCOPY src/frontend .\nCOPY sample_data public/digital_data\n\n# RUN ln -s /app/personal-d"
  },
  {
    "path": "src/frontend/README.md",
    "chars": 1244,
    "preview": "# Personal Timeline UI\n\nCreate and activate a new conda env:\n```\nconda create -n digital_data python=3.10\nconda activate"
  },
  {
    "path": "src/frontend/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/frontend/package.json",
    "chars": 1375,
    "preview": "{\n  \"name\": \"research_ui\",\n  \"version\": \"0.1.0\",\n  \"private\": true,\n  \"dependencies\": {\n    \"@babel/runtime\": \"^7.21.0\","
  },
  {
    "path": "src/frontend/public/index.html",
    "chars": 1738,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"utf-8\" />\n    <!-- <link rel=\"icon\" href=\"%PUBLIC_URL%/favi"
  },
  {
    "path": "src/frontend/public/manifest.json",
    "chars": 492,
    "preview": "{\n  \"short_name\": \"React App\",\n  \"name\": \"Create React App Sample\",\n  \"icons\": [\n    {\n      \"src\": \"favicon.ico\",\n     "
  },
  {
    "path": "src/frontend/public/robots.txt",
    "chars": 67,
    "preview": "# https://www.robotstxt.org/robotstxt.html\nUser-agent: *\nDisallow:\n"
  },
  {
    "path": "src/frontend/requirements.txt",
    "chars": 149,
    "preview": "plotly\nmatplotlib\nscikit-learn\nopenai\nflask\nhydra-core\ntransformers\nspotipy\nfaiss-cpu\npandas\nhydra-core==1.3.2\nlangchain"
  },
  {
    "path": "src/frontend/src/App.css",
    "chars": 934,
    "preview": ".App {\n  /* text-align: center; */\n  font-family: sans-serif;\n  margin: 20px;\n}\n\n.App-logo {\n  height: 40vmin;\n  pointer"
  },
  {
    "path": "src/frontend/src/App.js",
    "chars": 27840,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/App.test.js",
    "chars": 867,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/Constants.js",
    "chars": 807,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/index.css",
    "chars": 366,
    "preview": "body {\n  margin: 0;\n  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen',\n    'Ubuntu', 'Can"
  },
  {
    "path": "src/frontend/src/index.js",
    "chars": 1473,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/map/GoogleMapComponent.js",
    "chars": 2176,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/map/PlaceInfo.js",
    "chars": 2138,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/reportWebVitals.js",
    "chars": 983,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/service/DigitalDataImportor.js",
    "chars": 4850,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/setupTests.js",
    "chars": 862,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/timeline/EpiTimeline.js",
    "chars": 8056,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/timeline/builders.js",
    "chars": 6280,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/timeline/constants.js",
    "chars": 1201,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/frontend/src/timeline/utils.js",
    "chars": 5697,
    "preview": "/*\n * Copyright (c) Meta Platforms, Inc. and affiliates.\n *\n * Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "src/ingest/Dockerfile",
    "chars": 388,
    "preview": "FROM python:3.10.4\n\nWORKDIR ./app\n\nCOPY src/requirements.txt .\nRUN python -m pip install wheel setuptools pip --upgrade\n"
  },
  {
    "path": "src/ingest/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/create_episodes.py",
    "chars": 16518,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/derive_episodes.py",
    "chars": 10088,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/find_jpegs.py",
    "chars": 2440,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/geo_enrichment.py",
    "chars": 2420,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/image_deduplication.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/image_enrichment.py",
    "chars": 6476,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/socratic/README.md",
    "chars": 3083,
    "preview": "## Example code for running the socratic model\n\nThe original repo: [Link](https://huggingface.co/spaces/Geonmo/socratic-"
  },
  {
    "path": "src/ingest/enrichment/socratic/__init__.py",
    "chars": 644,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/socratic/process.ipynb",
    "chars": 749688,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\":"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/categories_places365.txt",
    "chars": 6833,
    "preview": "/a/airfield 0\n/a/airplane_cabin 1\n/a/airport_terminal 2\n/a/alcove 3\n/a/alley 4\n/a/amphitheater 5\n/a/amusement_arcade 6\n/"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/extract_text_features.py",
    "chars": 6072,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/openimage-classnames.csv",
    "chars": 475267,
    "preview": "/m/0100nhbf,Sprenger's tulip\r\n/m/0104x9kv,Vinegret\r\n/m/0105jzwx,Dabu-dabu\r\n/m/0105ld7g,Pistachio ice cream\r\n/m/0105lxy5,"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/place365-classnames.txt",
    "chars": 4389,
    "preview": "airfield\nairplane cabin\nairport terminal\nalcove\nalley\namphitheater\namusement arcade\namusement park\noutdoor apartment bui"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/tencent-ml-classnames.txt",
    "chars": 265820,
    "preview": "atmospheric phenomenon\nbody part\nbody of water, water\nhead, caput\nhair\nstructure, anatomical structure, complex body par"
  },
  {
    "path": "src/ingest/enrichment/socratic/prompts/tencent-ml-images.txt",
    "chars": 488167,
    "preview": "category_index\tcategory_id\tindex_of_parent_category\tcategory name\n0\tn00002452\t-1\tthing\n1\tn00020827\t-1\tmatter\n2\tn00002684"
  },
  {
    "path": "src/ingest/enrichment/socratic/requirements.txt",
    "chars": 79,
    "preview": "transformers\nftfy\nregex\ntqdm\ngit+https://github.com/openai/CLIP.git\ntorch\nwget\n"
  },
  {
    "path": "src/ingest/enrichment/socratic/socratic.py",
    "chars": 9744,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/export/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/export/export_entities.py",
    "chars": 5237,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/__init__.py",
    "chars": 602,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/create_amazon_LLEntries.py",
    "chars": 4880,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/create_apple_health_LLEntries.py",
    "chars": 7284,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/create_facebook_LLEntries.py",
    "chars": 4173,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/create_google_photo_LLEntries.py",
    "chars": 5250,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/create_googlemaps_LLEntries.py",
    "chars": 9371,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/generic_importer.py",
    "chars": 11433,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/generic_importer_workflow.py",
    "chars": 3350,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/importers/photo_importer_base.py",
    "chars": 6749,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/ingestion_startup.sh",
    "chars": 159,
    "preview": "#! /bin/bash\npython -m src.ingest.workflow\n# python -m src.ingest.offline_processing\npython -m src.ingest.create_episode"
  },
  {
    "path": "src/ingest/offline_processing.py",
    "chars": 20312,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/ingest/workflow.py",
    "chars": 4381,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/init.py",
    "chars": 1299,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/init.sh",
    "chars": 219,
    "preview": "#! /bin/bash\nmkdir -p ~/personal-data/app_data/static\n# cp src/frontend/static/* ~/personal-data/app_data/static/.\nln -s"
  },
  {
    "path": "src/qa/Dockerfile",
    "chars": 316,
    "preview": "FROM python:3.10.4\n\nWORKDIR /app\n\nCOPY src/requirements.txt .\nRUN python -m pip install wheel setuptools pip --upgrade\nR"
  },
  {
    "path": "src/qa/README.md",
    "chars": 387,
    "preview": "## Installing dependencies\n\nCreate and activate a new conda env:\n```\nconda create -n digital_data python=3.10\nconda acti"
  },
  {
    "path": "src/qa/chatgpt_engine.py",
    "chars": 1868,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/qa/posttext/README.md",
    "chars": 3756,
    "preview": "# PostText\n\nCreate and activate a new conda env:\n```\nconda create -n posttext python=3.10\nconda activate posttext\n```\n\nI"
  },
  {
    "path": "src/qa/posttext/__init__.py",
    "chars": 621,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "src/qa/posttext/config.ini",
    "chars": 1644,
    "preview": "[embedding_model]\nmodel = text-embedding-ada-002\nencoding = cl100k_base\nmax_tokens = 8000\n\n[codex]\nmodel = code-davinci-"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/annual_medical_care-log.csv",
    "chars": 4035,
    "preview": "eid,date,for_whom,type_of_care\ne40568,2022/11/04,child_medical_care,annual vision checkup\ne40567,2022/09/16,child_medica"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/config.ini",
    "chars": 1253,
    "preview": "[embedding_model]\r\nmodel = text-embedding-ada-002\r\nencoding = cl100k_base\r\nmax_tokens = 8000\r\n\r\n[Views]\r\nmodel_name = gp"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/create_db.sql",
    "chars": 2749,
    "preview": "PRAGMA page_size = 16384;\ncreate table annual_medical_care_log(eid TEXT PRIMARY KEY,date TEXT,for_whom TEXT,type_of_care"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/daily_chat-log.csv",
    "chars": 787043,
    "preview": "eid,date,timeofday,howlong,friends\ne207,2010/01/01,late in the evening,6,Rylee\ne208,2010/01/01,in the late afternoon,36,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/daily_exercise-log.csv",
    "chars": 126458,
    "preview": "eid,date,exercise,heart_rate\ne203,2010/01/01,running,150\ne211,2010/01/02,HIIT,111\ne220,2010/01/03,weight lifting,135\ne22"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/daily_meal-log.csv",
    "chars": 699630,
    "preview": "eid,date,mealtype,foodtype,people_string\ne204,2010/01/01,breakfast,oatmeal,\"Claire, Nora, Kinsley, Piper, Piper, Eva, Ly"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/daily_read-log.csv",
    "chars": 138370,
    "preview": "eid,date,readtype,howlong\ne209,2010/01/01,a book,40\ne218,2010/01/02,a book,27\ne227,2010/01/03,a book,31\ne237,2010/01/04,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/daily_watchtv-log.csv",
    "chars": 144365,
    "preview": "eid,date,watchtype,howlong\ne210,2010/01/01,a tv series,52\ne219,2010/01/02,a movie,30\ne228,2010/01/03,news,23\ne238,2010/0"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/marriages-log.csv",
    "chars": 39,
    "preview": "eid,married_date,partner_name,location\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/monthly_pet_care-log.csv",
    "chars": 23,
    "preview": "eid,date,pet_care_type\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/moves-log.csv",
    "chars": 34,
    "preview": "eid,date,type_of_move,destination\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/persona.json",
    "chars": 1139,
    "preview": "{\"age_years\": 30, \"birth_year\": 1992, \"birth_month\": 5, \"birth_day\": 27, \"birth_city\": \"Asansol\", \"birth_country\": \"Indi"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/timeline-dense.csv",
    "chars": 5587968,
    "preview": "id,date,desc,details\n0,1992/05/27,My parents are Avery and Jackson.,My parents are Avery and Jackson.\n1,2010/01/01,I did"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/travel-log.csv",
    "chars": 841,
    "preview": "eid,start_date,end_date,city,people\ne40550,2022/05/24,2022/05/27,\"New York, US\",\"Claire, Piper\"\ne37189,2021/01/11,2021/0"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/travel_dining-log.csv",
    "chars": 12917,
    "preview": "eid,start_date,end_date,city,dining_date,food_type,food_location,place_visit_date,place,people,action,emotion\ne7,2010/02"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/travel_places_visited-log.csv",
    "chars": 6481,
    "preview": "eid,start_date,end_date,city,place_visit_date,place,people,action,emotion\ne2,2010/02/12,2010/02/18,\"Philadelphia, US\",20"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/views_idx.csv",
    "chars": 550421,
    "preview": "tablename,embedding\nannual_medical_care_log,\"[0.018762968480587006, 0.03539874032139778, 0.007465643342584372, -0.027189"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/views_metadata.txt",
    "chars": 12363,
    "preview": "#This file contains meta data about views. \n#name: The name of the view, \n#description: a text description of what the v"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/weekly_bakeorcook-log.csv",
    "chars": 13526,
    "preview": "eid,date,cuisine,location,people\ne13505,2014/01/07,\"salmon chowder, poke, air-fryer brats\",my place,Benjamin\ne13506,2014"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/weekly_dating-log.csv",
    "chars": 26183,
    "preview": "eid,date,people_string,location\ne16,2010/01/02,Leah,a boba shop\ne17,2010/01/05,Elena,a restaurant\ne21,2010/01/18,Eliana,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/weekly_grocery-log.csv",
    "chars": 130051,
    "preview": "eid,date,fruits,drinks,toiletries,people_string\ne19,2010/01/06,\"guava, peaches, bananas, oranges\",\"soda, sports drinks, "
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/dense-100/weekly_hobby-log.csv",
    "chars": 18214,
    "preview": "eid,date,hobbies,people_string\ne18,2010/01/05,meditation,\ne22,2010/01/21,yoga,\ne36,2010/02/25,gardening,\ne39,2010/03/03,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/annual_medical_care-log.csv",
    "chars": 3887,
    "preview": "eid,date,for_whom,type_of_care\ne18129,2022/03/11,child_medical_care,annual physical checkup\ne18128,2022/11/27,child_medi"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/config.ini",
    "chars": 1253,
    "preview": "[embedding_model]\r\nmodel = text-embedding-ada-002\r\nencoding = cl100k_base\r\nmax_tokens = 8000\r\n\r\n[Views]\r\nmodel_name = gp"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/create_db.sql",
    "chars": 2748,
    "preview": "PRAGMA page_size = 8192;\ncreate table annual_medical_care_log(eid TEXT PRIMARY KEY,date TEXT,for_whom TEXT,type_of_care "
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/daily_chat-log.csv",
    "chars": 221196,
    "preview": "eid,date,timeofday,howlong,friends\ne132,2010/01/02,in the morning,41,Nora\ne135,2010/01/03,late in the evening,30,\"Piper,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/daily_exercise-log.csv",
    "chars": 68091,
    "preview": "eid,date,exercise,heart_rate\ne127,2010/01/01,hiking,164\ne133,2010/01/03,HIIT,142\ne138,2010/01/04,swimming,144\ne160,2010/"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/daily_meal-log.csv",
    "chars": 344842,
    "preview": "eid,date,mealtype,foodtype,people_string\ne128,2010/01/01,dinner,sushi,\ne130,2010/01/02,lunch,chinese food,\"Hazel, Layla,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/daily_read-log.csv",
    "chars": 68097,
    "preview": "eid,date,readtype,howlong\ne137,2010/01/03,a book,26\ne142,2010/01/04,news,29\ne155,2010/01/08,news,7\ne158,2010/01/09,socia"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/daily_watchtv-log.csv",
    "chars": 71005,
    "preview": "eid,date,watchtype,howlong\ne129,2010/01/01,a documentary,7\ne159,2010/01/09,a documentary,27\ne162,2010/01/10,news,16\ne169"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/marriages-log.csv",
    "chars": 39,
    "preview": "eid,married_date,partner_name,location\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/monthly_pet_care-log.csv",
    "chars": 23,
    "preview": "eid,date,pet_care_type\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/moves-log.csv",
    "chars": 34,
    "preview": "eid,date,type_of_move,destination\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/persona.json",
    "chars": 1139,
    "preview": "{\"age_years\": 30, \"birth_year\": 1992, \"birth_month\": 5, \"birth_day\": 27, \"birth_city\": \"Asansol\", \"birth_country\": \"Indi"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/timeline-medium.csv",
    "chars": 2411898,
    "preview": "id,date,desc,details\n0,1992/05/27,My parents are Avery and Jackson.,My parents are Avery and Jackson.\n1,2010/01/01,I did"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/timeline.json",
    "chars": 8462950,
    "preview": "{\"1992/05/27\": {\"birth_info\": {\"eid\": \"e0\", \"logical_representation\": [\"1992/05/27\", [\"Avery\", \"Jackson\"], [\"Asansol\", \""
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/travel-log.csv",
    "chars": 844,
    "preview": "eid,start_date,end_date,city,people\ne18113,2022/01/02,2022/01/09,\"Dubai, UAE\",\"Olivia, Nevaeh, Hazel\"\ne16597,2021/02/18,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/travel_dining-log.csv",
    "chars": 13185,
    "preview": "eid,start_date,end_date,city,dining_date,food_type,food_location,place_visit_date,place,people,action,emotion\ne7,2010/02"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/travel_places_visited-log.csv",
    "chars": 5166,
    "preview": "eid,start_date,end_date,city,place_visit_date,place,people,action,emotion\ne2,2010/02/12,2010/02/18,\"Philadelphia, US\",20"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/views_idx.csv",
    "chars": 550452,
    "preview": "tablename,embedding\nannual_medical_care_log,\"[0.018762968480587006, 0.03539874032139778, 0.007465643342584372, -0.027189"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/views_metadata.txt",
    "chars": 12363,
    "preview": "#This file contains meta data about views. \n#name: The name of the view, \n#description: a text description of what the v"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/weekly_bakeorcook-log.csv",
    "chars": 8105,
    "preview": "eid,date,cuisine,location,people\ne6097,2014/04/26,carrot cake,my place,\"Piper, Nora, Rylee, Piper\"\ne6106,2014/05/26,turk"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/weekly_dating-log.csv",
    "chars": 15708,
    "preview": "eid,date,people_string,location\ne16,2010/01/02,Leah,a boba shop\ne17,2010/01/05,Elena,a restaurant\ne19,2010/01/10,Charlot"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/weekly_grocery-log.csv",
    "chars": 83090,
    "preview": "eid,date,fruits,drinks,toiletries,people_string\ne21,2010/01/14,\"watermelons, apples\",\"tea, sports drinks, pineapple juic"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/medium-100/weekly_hobby-log.csv",
    "chars": 11158,
    "preview": "eid,date,hobbies,people_string\ne18,2010/01/05,meditation,\ne20,2010/01/14,yoga,\ne24,2010/01/26,gardening,\ne26,2010/02/04,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/annual_medical_care-log.csv",
    "chars": 3569,
    "preview": "eid,date,for_whom,type_of_care\ne7721,2022/10/13,child_medical_care,annual vision checkup\ne7720,2022/08/09,child_medical_"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/config.ini",
    "chars": 1242,
    "preview": "[embedding_model]\nmodel = text-embedding-ada-002\nencoding = cl100k_base\nmax_tokens = 8000\n\n[Views]\nmodel_name = gpt-3.5-"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/create_db.sql",
    "chars": 2723,
    "preview": "create table annual_medical_care_log(eid TEXT PRIMARY KEY,date TEXT,for_whom TEXT,type_of_care TEXT);\ncreate table daily"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/daily_chat-log.csv",
    "chars": 86265,
    "preview": "eid,date,timeofday,howlong,friends\ne82,2010/01/04,late in the evening,11,\"Piper, Olivia, Lydia, Piper\"\ne85,2010/01/07,la"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/daily_exercise-log.csv",
    "chars": 25681,
    "preview": "eid,date,exercise,heart_rate\ne78,2010/01/01,HIIT,129\ne84,2010/01/07,running,146\ne87,2010/01/09,biking,106\ne92,2010/01/10"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/daily_meal-log.csv",
    "chars": 137466,
    "preview": "eid,date,mealtype,foodtype,people_string\ne81,2010/01/03,dinner,chinese food,\ne83,2010/01/05,breakfast,cereals,Emily\ne86,"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/daily_read-log.csv",
    "chars": 26155,
    "preview": "eid,date,readtype,howlong\ne79,2010/01/02,a book,25\ne91,2010/01/09,social media,21\ne98,2010/01/14,a book,12\ne103,2010/01/"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/daily_watchtv-log.csv",
    "chars": 28840,
    "preview": "eid,date,watchtype,howlong\ne80,2010/01/02,a documentary,10\ne99,2010/01/15,a movie,17\ne101,2010/01/16,a documentary,16\ne1"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/marriages-log.csv",
    "chars": 39,
    "preview": "eid,married_date,partner_name,location\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/monthly_pet_care-log.csv",
    "chars": 23,
    "preview": "eid,date,pet_care_type\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/moves-log.csv",
    "chars": 34,
    "preview": "eid,date,type_of_move,destination\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/persona.json",
    "chars": 1139,
    "preview": "{\"age_years\": 30, \"birth_year\": 1992, \"birth_month\": 5, \"birth_day\": 27, \"birth_city\": \"Asansol\", \"birth_country\": \"Indi"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q1-result.csv",
    "chars": 29,
    "preview": "avg(num)\n0.07692307692307693\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q10-result.csv",
    "chars": 18,
    "preview": "foodtype\na burger\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q11-result.csv",
    "chars": 11,
    "preview": "count(*)\n2\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q12-result.csv",
    "chars": 30,
    "preview": "avg(howlong)\n9.45464362850972\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q13-result.csv",
    "chars": 12,
    "preview": "count(*)\n33\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q14-result.csv",
    "chars": 13,
    "preview": "count(*)\n156\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q15-result.csv",
    "chars": 22,
    "preview": "year,max(num)\n2019,28\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q16-result.csv",
    "chars": 11,
    "preview": "count(*)\n3\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q17-result.csv",
    "chars": 50,
    "preview": "cuisine,max(num)\nchocolate chip cookie in a mug,5\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q18-result.csv",
    "chars": 24,
    "preview": "avg\n0.06451612903225806\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q19-result.csv",
    "chars": 12,
    "preview": "count(*)\n27\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q2-result.csv",
    "chars": 11,
    "preview": "count(*)\n0\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q20-result.csv",
    "chars": 22,
    "preview": "year,max(num)\n2016,12\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q21-result.csv",
    "chars": 30,
    "preview": "fruits,max(num)\npineapples,73\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q22-result.csv",
    "chars": 17,
    "preview": "max(date_obj)\n\"\"\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q23-result.csv",
    "chars": 11,
    "preview": "count(*)\n3\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q24-result.csv",
    "chars": 27,
    "preview": "people_string,max(num)\n,22\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q25-result.csv",
    "chars": 25,
    "preview": "hobbies,max(num)\nyoga,26\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q26-result.csv",
    "chars": 121,
    "preview": "food_type\nstreet food\nlocal food\nstreet food\nJapanese food\nlocal food\nChinese food\nstreet food\nJapanese food\nstreet food"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q27-result.csv",
    "chars": 73,
    "preview": "food_type\nChinese food\nItalian food\nJapanese food\nIndian food\nlocal food\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q28-result.csv",
    "chars": 59,
    "preview": "food_type\nlocal food\nItalian food\nChinese food\nstreet food\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q29-result.csv",
    "chars": 81,
    "preview": "food_type,dining_date,place\nItalian food,2021-02-10 00:00:00.000000,Burj Khalifa\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q3-result.csv",
    "chars": 11,
    "preview": "count(*)\n0\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q30-result.csv",
    "chars": 11,
    "preview": "count(*)\n2\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q31-result.csv",
    "chars": 26,
    "preview": "count(distinct r.place)\n5\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q32-result.csv",
    "chars": 62,
    "preview": "place\nTower of London\nLondon Eye\nWestminster\nCamden\nHyde Park\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q33-result.csv",
    "chars": 13,
    "preview": "answer\nEmpty\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q34-result.csv",
    "chars": 13,
    "preview": "answer\nEmpty\n"
  },
  {
    "path": "src/qa/posttext/data/TimelineQA/sparse-100/results/q35-result.csv",
    "chars": 13,
    "preview": "answer\nEmpty\n"
  }
]

// ... and 44 more files (download for full content)

About this extraction

This page contains the full source code of the facebookresearch/personal-timeline GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 244 files (67.8 MB), approximately 7.6M tokens, and a symbol index with 358 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo